github bentoml/OpenLLM v0.4.0

latest releases: v0.6.30, v0.6.29, v0.6.28...
2 years ago

Release Highlights

OpenLLM 0.4.0 brings a few revamp feature

Unified API

0.4.0 brings a revamped API for OpenLLM. Users now can run LLM with two new API

  • await llm.generate_iterator(prompt, stop, **kwargs)
  • await llm.generate(prompt, stop, **kwargs

llm.generate is the one shot generation for any given prompt, whereas llm.generate_iterator is the streaming variant.

import openllm, asyncio

llm = openllm.LLM("HuggingFaceH4/zephyr-7b-beta")

async def infer(prompt,**kwargs): 
  return await llm.generate(prompt, **kwargs)

asyncio.run(infer("Time is a definition of"))

For using within a BentoML Service, one can do the following

import bentoml, openllm
import openllm

llm = openllm.LLM("HuggingFaceH4/zephyr-7b-beta")

svc = bentoml.Service(name='zephyr-instruct', runners=[llm.runner])

@svc.api(input=bentoml.io.Text(), output=bentoml.io.Text(media_type='text/event-stream'))
async def prompt(input_text: str) -> str:
  async for generation in llm.generate_iterator(input_text):
    yield f"data: {generation.outputs[0].text}\n\n"

Mistral supports

Mistral is now supported with OpenLLM. Simply do openllm start mistral to start a mistral server

AWQ support

AWQ is not supported with both vLLM and PyTorch backend. Simply pass --quantize awq to use AWQ.

Important

For using AWQ it is crucial that the model weight is already quantized with AWQ. Please look for the model variant on HuggingFace hub for the AWQ version of the model you want to use

General bug fixes

Fixes a bug with regards to tag generation. Standalone Bento that use this new API should just work as expected if the model is already exists in the model store.

For consistency, make sure to run openllm prune -y --include-bentos

Installation

pip install openllm==0.4.0

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.0

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P ghcr.io/bentoml/openllm:0.4.0 start opt

To run OpenLLM Clojure UI (community-maintained): docker run -p 8420:80 ghcr.io/bentoml/openllm-ui-clojure:0.4.0

Find more information about this release in the CHANGELOG.md

What's Changed

  • ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #563
  • chore(deps): bump aquasecurity/trivy-action from 0.13.0 to 0.13.1 by @dependabot in #562
  • chore(deps): bump taiki-e/install-action from 2.21.3 to 2.21.7 by @dependabot in #561
  • chore(deps-dev): bump eslint from 8.47.0 to 8.53.0 by @dependabot in #558
  • chore(deps): bump @vercel/og from 0.5.18 to 0.5.20 by @dependabot in #556
  • chore(deps-dev): bump @types/react from 18.2.20 to 18.2.35 by @dependabot in #559
  • chore(deps-dev): bump @typescript-eslint/eslint-plugin from 6.9.0 to 6.10.0 by @dependabot in #564
  • fix : updated client to toggle tls verification by @ABHISHEK03312 in #532
  • perf: unify LLM interface by @aarnphm in #518
  • fix(stop): stop is not available in config by @aarnphm in #566
  • infra: update docs on serving fine-tuning layers by @aarnphm in #567
  • fix: update build dependencies and format chat prompt by @aarnphm in #569
  • chore(examples): update openai client by @aarnphm in #568
  • fix(client): one-shot generation construction by @aarnphm in #570
  • feat: Mistral support by @aarnphm in #571

New Contributors

Full Changelog: v0.3.14...v0.4.0

Don't miss a new OpenLLM release

NewReleases is sending notifications on new releases.