What's new in 0.5.4 (2023-10-20)
These are the changes in inference v0.5.4.
New features
- FEAT: wizardcoder python by @UranusSeven in #539
- FEAT: Support grammar-based sampling for ggml models by @aresnow1 in #525
- FEAT: speculative decoding by @UranusSeven in #509
Enhancements
- ENH: Download embedding models from ModelScope by @ChengjieLi28 in #532
- ENH: lock transformers version by @UranusSeven in #549
- ENH: Support downloading code-llama family models from ModelScope by @ChengjieLi28 in #557
- ENH: Add gguf format of codellama-instruct by @aresnow1 in #567
Bug fixes
- BUG: Fix stream not compatible with openai by @codingl2k1 in #524
- BUG: set trust_remote_code to true by default by @richzw in #555
- BUG: add quantization to valid file name by @richzw in #562
- BUG: remove "generate" ability from Baichuan-2-chat json config by @Minamiyama in #556
Documentation
- DOC: update pot files by @UranusSeven in #538
- DOC: Add Client API reference by @codingl2k1 in #543
- DOC: Add client doc to the user guide by @codingl2k1 in #547
New Contributors
- @richzw made their first contribution in #555
- @Minamiyama made their first contribution in #556
Full Changelog: v0.5.3...v0.5.4