Features
Quantization now can be enabled during serving time:
openllm start stablelm --quantize int8This will loads the model in 8-bit mode, with bitsandbytes
For CPU machine, don't worry, you can use --bettertransformer instead:
openllm start stablelm --bettertransformerRoadmap
- GPTQ is being developed, will include support soon
Installation
pip install openllm==0.1.6To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.1.6Usage
All available models: python -m openllm.models
To start a LLM: python -m openllm start dolly-v2
Find more information about this release in the CHANGELOG.md
What's Changed
- refactor: toplevel CLI by @aarnphm in #26
- docs: add LangChain and BentoML Examples by @parano in #25
- feat: fine-tuning [part 1] by @aarnphm in #23
- feat: quantization by @aarnphm in #27
- perf: build quantization and better transformer behaviour by @aarnphm in #28
Full Changelog: v0.1.5...v0.1.6