NVIDIA Triton Inference Server

The NVIDIA Triton Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. For edge deployments, Triton Server is also available as a shared library with an API that allows the full functionality of the server to be included directly in an application.

What's New In 2.3.0

Python Client library is now a pip package available from the NVIDIA pypi index. See Python client documentation for more information.
The custom backend API, custom.h and associated custom backend SDK are no longer provided as part of the Triton release. Existing custom backends will continue to work with Triton and older releases of the SDK can still be used to create "legacy" custom backends. However, all users are strongly encouraged to move to the new Triton backend API.
Fix a performance issue with the HTTP/REST protocol and the Python client library that caused reduced performance when outputs were not requested explicitly in an inference request.
Fix some bugs in reporting of statistics for ensemble models.
GRPC updated to version 1.25.0.

Known Issues

The KFServing HTTP/REST and GRPC protocols and corresponding V2 experimental Python and C++ clients are beta quality and are likely to change. Specifically:
- The data returned by the statistics API will be changing to include additional information.
- The data returned by the repository index API will be changing to include additional information.
The new C API specified in tritonserver.h is beta quality and is likely to change.
TensorRT reformat-free I/O is not supported.
Some versions of Google Kubernetes Engine (GKE) contain a regression in the handling of LD_LIBRARY_PATH that prevents the inference server container from running correctly (see issue 141255952). Use a GKE 1.13 or earlier version or a GKE 1.14.6 or later version to avoid this issue.

Client Libraries and Examples

Ubuntu 18.04 builds of the client libraries and examples are included in this release in the attached v2.3.0_ubuntu1804.clients.tar.gz file. See the documentation section 'Building the Client Libraries and Examples' for more information on using these files. The client SDK is also available as a NGC Container.

Jetson Jetpack Support

An experimental release of Triton for the Developer Preview of JetPack 4.4 is available as part of the 20.06 release. See 20.06 release for more information.

triton-inference-server/server v2.3.0 Release 2.3.0 corresponding to NGC container 20.09 on GitHub

NVIDIA Triton Inference Server

What's New In 2.3.0

Known Issues

Client Libraries and Examples

Jetson Jetpack Support

triton-inference-server/server v2.3.0
Release 2.3.0 corresponding to NGC container 20.09

on GitHub