bentoml/BentoML v1.1.4 on GitHub

🍱 To better support LLM serving through response streaming, we are proud to introduce an experimental support of server-sent events (SSE) streaming support in this release of BentoML v1.14 and OpenLLM v0.2.27. See an example service definition for SSE streaming with Llama2.

Added response streaming through SSE to the bentoml.io.Text IO Descriptor type.
Added async generator support to both API Server and Runner to yield incremental text responses.
Added supported to ☁️ BentoCloud to natively support SSE streaming.

🦾 OpenLLM added token streaming capabilities to support streaming responses from LLMs.

Added /v1/generate_stream endpoint for streaming responses from LLMs.

curl -N -X 'POST' 'http://0.0.0.0:3000/v1/generate_stream' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
  "prompt": "### Instruction:\n What is the definition of time (200 words essay)?\n\n### Response:",
  "llm_config": {
    "use_llama2_prompt": false,
    "max_new_tokens": 4096,
    "early_stopping": false,
    "num_beams": 1,
    "num_beam_groups": 1,
    "use_cache": true,
    "temperature": 0.89,
    "top_k": 50,
    "top_p": 0.76,
    "typical_p": 1,
    "epsilon_cutoff": 0,
    "eta_cutoff": 0,
    "diversity_penalty": 0,
    "repetition_penalty": 1,
    "encoder_repetition_penalty": 1,
    "length_penalty": 1,
    "no_repeat_ngram_size": 0,
    "renormalize_logits": false,
    "remove_invalid_values": false,
    "num_return_sequences": 1,
    "output_attentions": false,
    "output_hidden_states": false,
    "output_scores": false,
    "encoder_no_repeat_ngram_size": 0,
    "n": 1,
    "best_of": 1,
    "presence_penalty": 0.5,
    "frequency_penalty": 0,
    "use_beam_search": false,
    "ignore_eos": false
  },
  "adapter_name": null
}'

What's Changed

docs: Update the models doc by @Sherlock113 in #4145
docs: Add more workflows to the GitHub Actions doc by @Sherlock113 in #4146
docs: Add text embedding example to readme by @Sherlock113 in #4151
fix: bento build cache miss by @xianml in #4153
fix(buildx): parsing attestation on docker desktop by @aarnphm in #4155

New Contributors

@xianml made their first contribution in #4153

Full Changelog: v1.1.3...v1.1.4

bentoml/BentoML v1.1.4 BentoML - v1.1.4 on GitHub

What's Changed

New Contributors

bentoml/BentoML v1.1.4
BentoML - v1.1.4

on GitHub