github withcatai/node-llama-cpp v3.18.0

9 hours ago

3.18.0 (2026-03-15)

Features

  • automatic checkpoints for models that need it (#573) (c641959)
  • QwenChatWrapper: Qwen 3.5 support (#573) (c641959)
  • inspect gpu command: detect and report missing prebuilt binary modules and custom npm registry (#573) (c641959)

Bug Fixes

  • resolveModelFile: deduplicate concurrent downloads (#570) (cc105b9)
  • correct Vulkan URL casing in documentation links (#568) (5a44506)
  • Qwen 3.5 memory estimation (#573) (c641959)
  • grammar use with HarmonyChatWrapper (#573) (c641959)
  • add mistral think segment detection (#573) (c641959)
  • compress excessively long segments from the current response on context shift instead of throwing an error (#573) (c641959)
  • default thinking budget to 75% of the context size to prevent low-quality responses (#573) (c641959)

Shipped with llama.cpp release b8352

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

Don't miss a new node-llama-cpp release

NewReleases is sending notifications on new releases.