bentoml/BentoML v1.0.22 on GitHub

🍱 BentoML v1.0.22 release has brought a list of well-anticipated updates.

Added support for Pydantic 2 for better validate performance.
Added support for CUDA 12 versions in builds and containerization.

Introduced service lifecycle events allowing adding custom logic on_deployment, on_startup, and on_shutdown. States can be managed using the context ctx variable during the on_startup and on_shutdown events and during request serving in the API.

@svc.on_deployment
def on_deployment():
  pass

@svc.on_startup
def on_startup(ctx: bentoml.Context):
  ctx.state["object_key"] = create_object()

@svc.on_shutdown
def on_shutdown(ctx: bentoml.Context):
  cleanup_state(ctx.state["object_key"])

@svc.api
def predict(input_data, ctx):
  object = ctx.state["object_key"]
  pass

Added support for traffic control for both API Server and Runners. Timeout and maximum concurrency can now be configured through configuration.

api_server:
  traffic:
    timeout: 10 # API Server request timeout in seconds
    max_concurrency: 32 # Maximum concurrency requests in the API Server

runners:
  iris:
    traffic:
      timeout: 10 # Runner request timeout in seconds
      max_concurrency: 32 # Maximum concurrency requests in the Runner

Improved performance of bentoml push performance for large Bentos.

🚀 One more thing, the team is delighted to unveil our latest endeavor, OpenLLM. This innovative project allows you to effortless build with the state-of-the-art open source or fine-tuned Large Language Models.

Supports all variants of Flan-T5, Dolly V2, StarCoder, Falcon, StableLM, and ChatGLM out-of-box. Fully customizable with model specific arguments.
```
openllm start [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]
```
Exposes the familiar BentoML APIs and transforms LLMs seamlessly into Runners.
```
llm_runner = openllm.Runner("dolly-v2")
```
Builds LLM application into the Bento format that can be deployed to BentoCloud or containerized into OCI images.
```
openllm build [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]
```

Our dedicated team is working hard to pioneering more integrations of advanced models for our upcoming releases of OpenLLM. Stay tuned for the unfolding developments.

bentoml/BentoML v1.0.22 BentoML - v1.0.22 on GitHub

bentoml/BentoML v1.0.22
BentoML - v1.0.22

on GitHub