What's new in 0.12.0 (2024-06-07)
These are the changes in inference v0.12.0.
New features
- FEAT: new model: mini-cpm-llama3-v-2.5 by @Minamiyama in #1577
- FEAT: support glm4-chat & glm4-chat-1m by @qinxuye in #1584
- FEAT: add mistral-instruct-v0.3 by @qinxuye in #1576
- FEAT: add codestral-v0.1 by @qinxuye in #1575
- FEAT: Support ChatTTS by @codingl2k1 in #1578
- FEAT: Continuous batching for chat model on transformers backend by @ChengjieLi28 in #1548
- FEAT: support qwen2 by @qinxuye in #1597
- Feat: support glm-4v 9b by @Minamiyama in #1591
Enhancements
- ENH: make CogVLM2 support stream output by @Minamiyama in #1572
- BLD: Docker clean all images after building image on self-hosted machine by @ChengjieLi28 in #1595
- BLD: Fix pip is looking multiple versions of some packages while installing by @ChengjieLi28 in #1603
Bug fixes
- BUG: Fix typo for cogvlm2 by @Minamiyama in #1573
Documentation
- DOC: added new models in README by @qinxuye in #1585
- DOC: Fix audio doc by @codingl2k1 in #1593
- DOC: Usage about cal-model-memory by @wxiwnd in #1589
- DOC: Fix audio doc by @codingl2k1 in #1599
- DOC: Continuous batching by @ChengjieLi28 in #1602
- DOC: add new models to readme by @qinxuye in #1604
New Contributors
Full Changelog: v0.11.3...v0.12.0