3.0.0-beta.15 (2024-04-04)
Bug Fixes
- create a context with no parameters (#188) (6267778)
- improve chat wrappers tokenization (#182) (35e6f50)
- use the new
llama.cppCUDA flag (#182) (35e6f50) - adapt to breaking
llama.cppchanges (#183) (6b012a6)
Features
- automatically adapt to current free VRAM state (#182) (35e6f50)
inspect ggufcommand (#182) (35e6f50)inspect measurecommand (#182) (35e6f50)readGgufFileInfofunction (#182) (35e6f50)- GGUF file metadata info on
LlamaModel(#182) (35e6f50) JinjaTemplateChatWrapper(#182) (35e6f50)- use the
tokenizer.chat_templateheader from thegguffile when available - use it to find a better specialized chat wrapper or useJinjaTemplateChatWrapperwith it as a fallback (#182) (35e6f50) - simplify generation CLI commands:
chat,complete,infill(#182) (35e6f50) - Windows on Arm prebuilt binary (#181) (f3b7f81)
Shipped with llama.cpp release b2608
To use the latest
llama.cpprelease available, runnpx --no node-llama-cpp download --release latest. (learn more)