✨ Gemma 4 is here! ✨
Read about the release in the blog post
3.19.0 (2026-06-30)
Features
- Gemma 4 support (#591) (5fe6e27) (documentation: Gemma 4)
- riscv64 prebuilt binaries (#615) (e8336a4)
- automatically enable flash attention when optimal
- improve inference performance when a grammar is active
- more precise resource usage estimation
- resource usage capping (documentation: Resource Capping)
- automatically enable or disable mmap depending on the environment
- support
Q1_0quant - improve stability on unified memory systems
- disable residency sets on macOS by default for better OS responsiveness
- default
progressLogsto"stderr"to avoid polluting stdout with logs - optimized prebuilt binaries for arm architectures
Bug Fixes
MXFP4_MOEquant name- Vulkan backend successful load detection even when no devices are available
- CLI: avoid redownloading existing models that consist of multiple parts from a URI
- optimize checkpoints management when using grammar
- improve stability when loading huge models
- reranking result range for Qwen 3 reranker
- adapt to breaking
llama.cppchanges
Shipped with llama.cpp release b9842
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)
