bentoml/BentoML v1.0.16 on GitHub

🍱 BentoML v1.0.16 release is here featuring the introduction of the bentoml.triton framework. With this integration, BentoML now supports running NVIDIA Triton Inference Server as a Runner. See Triton Inference Server documentation to learn more!

Triton Inference Server can be configured as a Runner in BentoML with its model repository and CLI arguments specified as parameters.

import bentoml

triton_runner = bentoml.triton.Runner(
	"triton_runner",
	model_repository="s3://bucket/path/to/model_repository",
	cli_args=["--load-model=torchscrip_yolov5s", "--model-control-mode=explicit"],
)

Models served by the Triton Inference Server Runner can be called as a method on the runner handle both synchronously and asynchronously.

@svc.api(
    input=bentoml.io.Image.from_sample("./data/0.png"), output=bentoml.io.NumpyNdarray()
)
async def bentoml_torchscript_mnist_infer(im: Image) -> NDArray[t.Any]:
    arr = np.array(im) / 255.0
    arr = np.expand_dims(arr, (0, 1)).astype("float32")
    InferResult = await triton_runner.torchscript_mnist.async_run(arr)
    return InferResult.as_numpy("OUTPUT__0")

Build bentos and containerize images with Triton Runners by specifying nvcr.io/nvidia/tritonserver base image in bentofile.yaml.

service: service:svc
include:
  - /model_repository
  - /data/*.png
  - /*.py
exclude:
  - /__pycache__
  - /venv
  - /train.py
  - /build_bento.py
  - /containerize_bento.py
python:
  packages:
    - bentoml[triton]
docker:
  base_image: nvcr.io/nvidia/tritonserver:22.12-py3

💡 If you are an existing Triton user, the integration provides simpler ways to add custom logics in Python, deploy distributed multi-model inference graph, unify model management across different ML frameworks and workflows, and standardize model packaging format with versioning and collaboration features. If you are an existing BentoML user, the integration improves the runner efficiency and throughput under high load thanks to Triton’s efficient C++ runtime.

What's Changed

fix(container): podman virtual machine healthcheck (#3575) by @timc in #3576
chore(aiohttp): remove deprecated verify_ssl to ssl by @aarnphm in #3574
feat(triton): support HTTP client by @aarnphm in #3502
fix(grpc): handle backward protocol version by @aarnphm in #3332
chore(deps): bump ruff from 0.0.246 to 0.0.247 by @dependabot in #3579
chore(test): using container API for testing by @aarnphm in #3582
fix(serve-cli): Make sure to use BENTOML_CONFIG value by @aarnphm in #3597
docs: Update documentation with an examples link by @ssheng in #3599
chore: lock starlette version by @sauyon in #3600
feature(diffusers): support enable_attention_slicing by @larme in #3598
chore(cli): figlet to show on CLI only by @aarnphm in #3603
chore(cli): using default background as color by @aarnphm in #3608
feat: Flax by @aarnphm in #3123
feat(gRPC): client implementation by @aarnphm in #3280
fix: invalid option dtype=True for pd.read_csv by @parano in #3601
chore(deps): bump coverage[toml] from 7.1.0 to 7.2.0 by @dependabot in #3616
chore(deps): bump ruff from 0.0.247 to 0.0.252 by @dependabot in #3617
docs: containerisation API by @aarnphm in #3518
chore(deps): bump coverage[toml] from 7.2.0 to 7.2.1 by @dependabot in #3621
chore(deps): bump imageio from 2.25.1 to 2.26.0 by @dependabot in #3620
fix(docs): missing space bug causes table not to render by @aarnphm in #3622
chore(deps): bump ruff from 0.0.252 to 0.0.253 by @dependabot in #3624
feat: enable cork for non-batched workloads by @sauyon in #3602
docs: Fix typo in concepts/service by @FelixSchuSi in #3627
chore(deps): bump tritonclient[all] from 2.30.0 to 2.31.0 by @dependabot in #3628
fix(docs): broken inline docstring by @aarnphm in #3538
fix: use a semaphore to limit runner connections by @sauyon in #3607
fix: make inference_api handle None type by @aarnphm in #3611
fix: make sure not to override user set values for from_sample by @aarnphm in #3610
docs: add exceptions API section by @aarnphm in #3609
revert(pyproject): add back pytest plugins by @aarnphm in #3633
fix(configuration): CORS docs, allow_origins and allow_headers by @larme in #3643
chore(deps): bump ruff from 0.0.253 to 0.0.254 by @dependabot in #3641
chore(deps): bump pytest from 7.2.1 to 7.2.2 by @dependabot in #3642
chore: http client healthcheck by @denyszhak in #3636
docs: typo in configuration.rst by @davkime in #3644
docs: correct links to configuration source code by @davkime in #3645
example: add fraud detection and benchmark examples by @parano in #3647
fix(containerize): remove autoconfig for buildctl by @aarnphm in #3484
feat: name in bentofile.yaml by @aarnphm in #3604
chore: ensure all labels are dict[str,str] by @aarnphm in #3605
fix(triton): enable runtime options by @aarnphm in #3649
docs: Triton Inference Server by @aarnphm in #3519
example: Triton Inference Server by @aarnphm in #3471
chore(deps): bump pytest from 7.2.1 to 7.2.2 in /requirements by @dependabot in #3639
chore(deps): bump bufbuild/buf-setup-action from 1.14.0 to 1.15.0 by @dependabot in #3638
fix: some missing logics for triton examples by @aarnphm in #3650
fix: use async implementation by @characat0 in #3654
feat: add ray deploy support by @parano in #3632
chore(deps): bump pytest-xdist[psutil] from 3.2.0 to 3.2.1 by @dependabot in #3659
chore(deps): bump bufbuild/buf-setup-action from 1.15.0 to 1.15.1 by @dependabot in #3655
fix: update scheme logic using ssl.enabled by @aarnphm in #3660
feat: from_sample docstring by @aarnphm in #3318
fix(ci): locking starlette for container tests by @aarnphm in #3666
chore: better exception for numpy by @sauyon in #3665
feat: make file io descriptor allow any mime type by default by @sauyon in #3626
fix(docs): broken link by @aarnphm in #3537
chore(stubs): remove unused by @aarnphm in #3612
docs: Update Triton documentation and examples by @ssheng in #3668
chore(deps): bump ruff from 0.0.254 to 0.0.255 by @dependabot in #3671
docs: Update integration docs by @ssheng in #3672

New Contributors

@FelixSchuSi made their first contribution in #3627
@denyszhak made their first contribution in #3636
@davkime made their first contribution in #3644

Full Changelog: v1.0.15...v1.0.16

bentoml/BentoML v1.0.16 BentoML - v1.0.16 on GitHub

What's Changed

New Contributors

bentoml/BentoML v1.0.16
BentoML - v1.0.16

on GitHub