What's new in 0.1.0 (2023-07-28)
These are the changes in inference v0.1.0.
New features
- FEAT: support fp4 and int8 quantization for pytorch model by @pangyoki in #238
- FEAT: support llama-2-chat-70b ggml by @UranusSeven in #257
Enhancements
- ENH: skip 4-bit quantization for non-linux or non-cuda local deployment by @UranusSeven in #264
- ENH: handle legacy cache by @UranusSeven in #266
- REF: model family by @UranusSeven in #251
Bug fixes
- BUG: fix restful stop parameters by @RayJi01 in #241
- BUG: download integrity hot fix by @RayJi01 in #242
- BUG: disable baichuan-chat and baichuan-base on macos by @pangyoki in #250
- BUG: delete tqdm_class in snapshot_download by @pangyoki in #258
- BUG: ChatGLM Parameter Switch by @Bojun-Feng in #262
- BUG: refresh related fields when format changes by @UranusSeven in #265
- BUG: Show downloading progress in gradio by @aresnow1 in #267
- BUG: LLM json not included by @UranusSeven in #268
Tests
- TST: Update ChatGLM Tests by @Bojun-Feng in #259
Documentation
- DOC: Update installation part in readme by @aresnow1 in #253
- DOC: update readme for pytorch model by @pangyoki in #207
Full Changelog: v0.0.6...v0.1.0