github bentoml/BentoML v1.0.16
BentoML - v1.0.16

latest releases: v1.3.12, v1.3.11, v1.3.10...
20 months ago

🍱 BentoML v1.0.16 release is here featuring the introduction of the bentoml.triton framework. With this integration, BentoML now supports running NVIDIA Triton Inference Server as a Runner. See Triton Inference Server documentation to learn more!

  • Triton Inference Server can be configured as a Runner in BentoML with its model repository and CLI arguments specified as parameters.

    import bentoml
    
    triton_runner = bentoml.triton.Runner(
    	"triton_runner",
    	model_repository="s3://bucket/path/to/model_repository",
    	cli_args=["--load-model=torchscrip_yolov5s", "--model-control-mode=explicit"],
    )
  • Models served by the Triton Inference Server Runner can be called as a method on the runner handle both synchronously and asynchronously.

    @svc.api(
        input=bentoml.io.Image.from_sample("./data/0.png"), output=bentoml.io.NumpyNdarray()
    )
    async def bentoml_torchscript_mnist_infer(im: Image) -> NDArray[t.Any]:
        arr = np.array(im) / 255.0
        arr = np.expand_dims(arr, (0, 1)).astype("float32")
        InferResult = await triton_runner.torchscript_mnist.async_run(arr)
        return InferResult.as_numpy("OUTPUT__0")
  • Build bentos and containerize images with Triton Runners by specifying nvcr.io/nvidia/tritonserver base image in bentofile.yaml.

    service: service:svc
    include:
      - /model_repository
      - /data/*.png
      - /*.py
    exclude:
      - /__pycache__
      - /venv
      - /train.py
      - /build_bento.py
      - /containerize_bento.py
    python:
      packages:
        - bentoml[triton]
    docker:
      base_image: nvcr.io/nvidia/tritonserver:22.12-py3

💡 If you are an existing Triton user, the integration provides simpler ways to add custom logics in Python, deploy distributed multi-model inference graph, unify model management across different ML frameworks and workflows, and standardize model packaging format with versioning and collaboration features. If you are an existing BentoML user, the integration improves the runner efficiency and throughput under high load thanks to Triton’s efficient C++ runtime.

What's Changed

New Contributors

Full Changelog: v1.0.15...v1.0.16

Don't miss a new BentoML release

NewReleases is sending notifications on new releases.