🍱 BentoML v1.0.16
release is here featuring the introduction of the bentoml.triton
framework. With this integration, BentoML now supports running NVIDIA Triton Inference Server as a Runner. See Triton Inference Server documentation to learn more!
-
Triton Inference Server can be configured as a Runner in BentoML with its model repository and CLI arguments specified as parameters.
import bentoml triton_runner = bentoml.triton.Runner( "triton_runner", model_repository="s3://bucket/path/to/model_repository", cli_args=["--load-model=torchscrip_yolov5s", "--model-control-mode=explicit"], )
-
Models served by the Triton Inference Server Runner can be called as a method on the runner handle both synchronously and asynchronously.
@svc.api( input=bentoml.io.Image.from_sample("./data/0.png"), output=bentoml.io.NumpyNdarray() ) async def bentoml_torchscript_mnist_infer(im: Image) -> NDArray[t.Any]: arr = np.array(im) / 255.0 arr = np.expand_dims(arr, (0, 1)).astype("float32") InferResult = await triton_runner.torchscript_mnist.async_run(arr) return InferResult.as_numpy("OUTPUT__0")
-
Build bentos and containerize images with Triton Runners by specifying
nvcr.io/nvidia/tritonserver
base image inbentofile.yaml
.service: service:svc include: - /model_repository - /data/*.png - /*.py exclude: - /__pycache__ - /venv - /train.py - /build_bento.py - /containerize_bento.py python: packages: - bentoml[triton] docker: base_image: nvcr.io/nvidia/tritonserver:22.12-py3
💡 If you are an existing Triton user, the integration provides simpler ways to add custom logics in Python, deploy distributed multi-model inference graph, unify model management across different ML frameworks and workflows, and standardize model packaging format with versioning and collaboration features. If you are an existing BentoML user, the integration improves the runner efficiency and throughput under high load thanks to Triton’s efficient C++ runtime.
What's Changed
- fix(container): podman virtual machine healthcheck (#3575) by @timc in #3576
- chore(aiohttp): remove deprecated verify_ssl to ssl by @aarnphm in #3574
- feat(triton): support HTTP client by @aarnphm in #3502
- fix(grpc): handle backward protocol version by @aarnphm in #3332
- chore(deps): bump ruff from 0.0.246 to 0.0.247 by @dependabot in #3579
- chore(test): using container API for testing by @aarnphm in #3582
- fix(serve-cli): Make sure to use BENTOML_CONFIG value by @aarnphm in #3597
- docs: Update documentation with an examples link by @ssheng in #3599
- chore: lock starlette version by @sauyon in #3600
- feature(diffusers): support
enable_attention_slicing
by @larme in #3598 - chore(cli): figlet to show on CLI only by @aarnphm in #3603
- chore(cli): using default background as color by @aarnphm in #3608
- feat: Flax by @aarnphm in #3123
- feat(gRPC): client implementation by @aarnphm in #3280
- fix: invalid option dtype=True for pd.read_csv by @parano in #3601
- chore(deps): bump coverage[toml] from 7.1.0 to 7.2.0 by @dependabot in #3616
- chore(deps): bump ruff from 0.0.247 to 0.0.252 by @dependabot in #3617
- docs: containerisation API by @aarnphm in #3518
- chore(deps): bump coverage[toml] from 7.2.0 to 7.2.1 by @dependabot in #3621
- chore(deps): bump imageio from 2.25.1 to 2.26.0 by @dependabot in #3620
- fix(docs): missing space bug causes table not to render by @aarnphm in #3622
- chore(deps): bump ruff from 0.0.252 to 0.0.253 by @dependabot in #3624
- feat: enable cork for non-batched workloads by @sauyon in #3602
- docs: Fix typo in concepts/service by @FelixSchuSi in #3627
- chore(deps): bump tritonclient[all] from 2.30.0 to 2.31.0 by @dependabot in #3628
- fix(docs): broken inline docstring by @aarnphm in #3538
- fix: use a semaphore to limit runner connections by @sauyon in #3607
- fix: make inference_api handle None type by @aarnphm in #3611
- fix: make sure not to override user set values for from_sample by @aarnphm in #3610
- docs: add exceptions API section by @aarnphm in #3609
- revert(pyproject): add back pytest plugins by @aarnphm in #3633
- fix(configuration): CORS docs,
allow_origins
andallow_headers
by @larme in #3643 - chore(deps): bump ruff from 0.0.253 to 0.0.254 by @dependabot in #3641
- chore(deps): bump pytest from 7.2.1 to 7.2.2 by @dependabot in #3642
- chore: http client healthcheck by @denyszhak in #3636
- docs: typo in configuration.rst by @davkime in #3644
- docs: correct links to configuration source code by @davkime in #3645
- example: add fraud detection and benchmark examples by @parano in #3647
- fix(containerize): remove autoconfig for buildctl by @aarnphm in #3484
- feat: name in bentofile.yaml by @aarnphm in #3604
- chore: ensure all labels are dict[str,str] by @aarnphm in #3605
- fix(triton): enable runtime options by @aarnphm in #3649
- docs: Triton Inference Server by @aarnphm in #3519
- example: Triton Inference Server by @aarnphm in #3471
- chore(deps): bump pytest from 7.2.1 to 7.2.2 in /requirements by @dependabot in #3639
- chore(deps): bump bufbuild/buf-setup-action from 1.14.0 to 1.15.0 by @dependabot in #3638
- fix: some missing logics for triton examples by @aarnphm in #3650
- fix: use async implementation by @characat0 in #3654
- feat: add ray deploy support by @parano in #3632
- chore(deps): bump pytest-xdist[psutil] from 3.2.0 to 3.2.1 by @dependabot in #3659
- chore(deps): bump bufbuild/buf-setup-action from 1.15.0 to 1.15.1 by @dependabot in #3655
- fix: update scheme logic using ssl.enabled by @aarnphm in #3660
- feat:
from_sample
docstring by @aarnphm in #3318 - fix(ci): locking starlette for container tests by @aarnphm in #3666
- chore: better exception for numpy by @sauyon in #3665
- feat: make file io descriptor allow any mime type by default by @sauyon in #3626
- fix(docs): broken link by @aarnphm in #3537
- chore(stubs): remove unused by @aarnphm in #3612
- docs: Update Triton documentation and examples by @ssheng in #3668
- chore(deps): bump ruff from 0.0.254 to 0.0.255 by @dependabot in #3671
- docs: Update integration docs by @ssheng in #3672
New Contributors
- @FelixSchuSi made their first contribution in #3627
- @denyszhak made their first contribution in #3636
- @davkime made their first contribution in #3644
Full Changelog: v1.0.15...v1.0.16