github ggml-org/llama.cpp b8729

latest releases: b8732, b8730, b8731...
2 hours ago
Details

jinja : support ensure_ascii=true, string repetition and int/float self-filtering (#21623)

  • feat: jinja engine improvements for reka-edge

Port three Jinja engine improvements needed for the reka-edge model:

  1. Python-style string repetition ("ab" * 3 → "ababab")
  2. ensure_ascii=true support for tojson filter (escapes non-ASCII to \uXXXX)
  3. int() builtin on value_int_t (identity, needed for Reka Edge template)
  • fix: escape invalid utf8 bytes when ensure_ascii=true

The json_ensure_ascii_preserving_format function does not correctly
handle an edge case where if UTF-8 parsing fails, it adds the non-ascii
character back to the output as a raw byte.

This commit fixes that by adding the unicode standard replacement
character \ufffd to the output instead. This is the standard behavior
for various programming languages like Python, Rust, Go, etc.

  • chore: address PR comments
  1. Add todo comment for supporting string repetition for array/tuples
  2. Add support for float identity operation
  3. Move invalid ascii test case to test_fuzzing
  • chore: accept suggestion for common/jinja/value.cpp

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com


Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.