Details
jinja : support ensure_ascii=true, string repetition and int/float self-filtering (#21623)
- feat: jinja engine improvements for reka-edge
Port three Jinja engine improvements needed for the reka-edge model:
- Python-style string repetition ("ab" * 3 → "ababab")
- ensure_ascii=true support for tojson filter (escapes non-ASCII to \uXXXX)
- int() builtin on value_int_t (identity, needed for Reka Edge template)
- fix: escape invalid utf8 bytes when ensure_ascii=true
The json_ensure_ascii_preserving_format function does not correctly
handle an edge case where if UTF-8 parsing fails, it adds the non-ascii
character back to the output as a raw byte.
This commit fixes that by adding the unicode standard replacement
character \ufffd to the output instead. This is the standard behavior
for various programming languages like Python, Rust, Go, etc.
- chore: address PR comments
- Add todo comment for supporting string repetition for array/tuples
- Add support for float identity operation
- Move invalid ascii test case to test_fuzzing
- chore: accept suggestion for common/jinja/value.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu s390x (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: