Release Highlights
OpenLLM 0.4.0 brings a few revamp feature
Unified API
0.4.0 brings a revamped API for OpenLLM. Users now can run LLM with two new API
await llm.generate_iterator(prompt, stop, **kwargs)await llm.generate(prompt, stop, **kwargs
llm.generate is the one shot generation for any given prompt, whereas llm.generate_iterator is the streaming variant.
import openllm, asyncio
llm = openllm.LLM("HuggingFaceH4/zephyr-7b-beta")
async def infer(prompt,**kwargs):
return await llm.generate(prompt, **kwargs)
asyncio.run(infer("Time is a definition of"))For using within a BentoML Service, one can do the following
import bentoml, openllm
import openllm
llm = openllm.LLM("HuggingFaceH4/zephyr-7b-beta")
svc = bentoml.Service(name='zephyr-instruct', runners=[llm.runner])
@svc.api(input=bentoml.io.Text(), output=bentoml.io.Text(media_type='text/event-stream'))
async def prompt(input_text: str) -> str:
async for generation in llm.generate_iterator(input_text):
yield f"data: {generation.outputs[0].text}\n\n"Mistral supports
Mistral is now supported with OpenLLM. Simply do openllm start mistral to start a mistral server
AWQ support
AWQ is not supported with both vLLM and PyTorch backend. Simply pass --quantize awq to use AWQ.
Important
For using AWQ it is crucial that the model weight is already quantized with AWQ. Please look for the model variant on HuggingFace hub for the AWQ version of the model you want to use
General bug fixes
Fixes a bug with regards to tag generation. Standalone Bento that use this new API should just work as expected if the model is already exists in the model store.
For consistency, make sure to run openllm prune -y --include-bentos
Installation
pip install openllm==0.4.0To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.4.0Usage
All available models: openllm models
To start a LLM: python -m openllm start opt
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.0 start opt
To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.0
Find more information about this release in the CHANGELOG.md
What's Changed
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #563
- chore(deps): bump aquasecurity/trivy-action from 0.13.0 to 0.13.1 by @dependabot in #562
- chore(deps): bump taiki-e/install-action from 2.21.3 to 2.21.7 by @dependabot in #561
- chore(deps-dev): bump eslint from 8.47.0 to 8.53.0 by @dependabot in #558
- chore(deps): bump @vercel/og from 0.5.18 to 0.5.20 by @dependabot in #556
- chore(deps-dev): bump @types/react from 18.2.20 to 18.2.35 by @dependabot in #559
- chore(deps-dev): bump @typescript-eslint/eslint-plugin from 6.9.0 to 6.10.0 by @dependabot in #564
- fix : updated client to toggle tls verification by @ABHISHEK03312 in #532
- perf: unify LLM interface by @aarnphm in #518
- fix(stop): stop is not available in config by @aarnphm in #566
- infra: update docs on serving fine-tuning layers by @aarnphm in #567
- fix: update build dependencies and format chat prompt by @aarnphm in #569
- chore(examples): update openai client by @aarnphm in #568
- fix(client): one-shot generation construction by @aarnphm in #570
- feat: Mistral support by @aarnphm in #571
New Contributors
- @ABHISHEK03312 made their first contribution in #532
Full Changelog: v0.3.14...v0.4.0