bentoml/OpenLLM v0.1.6 on GitHub

Features

Quantization now can be enabled during serving time:

openllm start stablelm --quantize int8

This will loads the model in 8-bit mode, with bitsandbytes

For CPU machine, don't worry, you can use --bettertransformer instead:

openllm start stablelm --bettertransformer

pip install openllm==0.1.6

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.1.6

All available models: python -m openllm.models

To start a LLM: python -m openllm start dolly-v2

Find more information about this release in the CHANGELOG.md

Full Changelog: v0.1.5...v0.1.6