3.0.0-beta.45 (2024-09-19)
Bug Fixes
- improve performance of parallel evaluation from multiple contexts (#309) (4b3ad61)
- Llama 3.1 chat wrapper standard chat history (#309) (4b3ad61)
- adapt to
llama.cppsampling refactor (#309) (4b3ad61) - Llama 3 Instruct function calling (#309) (4b3ad61)
- don't preload prompt in the
chatcommand when using--printTimingsor--meter(#309) (4b3ad61) - more stable Jinja template matching (#309) (4b3ad61)
Features
inspect estimatecommand (#309) (4b3ad61)- move
seedoption to the prompt level (#309) (4b3ad61) - Functionary v3 support (#309) (4b3ad61)
- Mistral chat wrapper (#309) (4b3ad61)
- improve Llama 3.1 chat template detection (#309) (4b3ad61)
- change
autoDisposeSequencedefault tofalse(#309) (4b3ad61) - move
download,buildandclearcommands to be subcommands of asourcecommand (#309) (4b3ad61) - simplify
TokenBias(#309) (4b3ad61) - better
threadsdefault value (#309) (4b3ad61) - make
LlamaEmbeddingan object (#309) (4b3ad61) HF_TOKENenv var support for reading GGUF file metadata (#309) (4b3ad61)TemplateChatWrapper: custom history template for each message role (#309) (4b3ad61)- more helpful
inspect gpucommand (#309) (4b3ad61) - all tokenizer tokens iterator (#309) (4b3ad61)
- failed context creation automatic remedy (#309) (4b3ad61)
- abort generation support in CLI commands (#309) (4b3ad61)
--gpuLayers maxand--contextSize maxflag support forinspect estimatecommand (#309) (4b3ad61)- extract all prebuilt binaries to external modules (#309) (4b3ad61)
- updated docs (#309) (4b3ad61)
- combine model downloaders (#309) (4b3ad61)
- feat(electron example template): update badge, scroll anchoring, table support (#309) (4b3ad61)
Shipped with llama.cpp release b3785
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)