NVIDIA TensorRT Inference Server

The NVIDIA TensorRT Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New In 1.11.0

The TensorRT backend is improved to have significantly better performance. Improvements include reducing thread contention, using pinned memory for faster CPU<->GPU transfers, and increasing compute and memory copy overlap on GPUs.
Reduce memory usage of TensorRT models in many cases by sharing weights across multiple model instances.
Boolean data-type and shape tensors are now supported for TensorRT models.
A new model configuration option allows the dynamic batcher to create “ragged” batches for custom backend models. A ragged batch is a batch where one or more of the input/output tensors have different shapes in different batch entries.
Local S3 storage endpoints are now supported for model repositories. A local S3 endpoint is specified as 's3://host:port/path/to/repository'.
The Helm chart showing an example Kubernetes deployment is updated to include Prometheus and Grafana support so that inference server metrics can be collected and visualized.
The inference server container no longer sets LD_LIBRARY_PATH, instead the server uses RUNPATH to locate its shared libraries.
Python 2 is end-of-life so all support has been removed. Python 3 is still supported.
Ubuntu 18.04 with January 2020 updates

Known Issues

TensorRT reformat-free I/O is not supported.
Some versions of Google Kubernetes Engine (GKE) contain a regression in the handling of LD_LIBRARY_PATH that prevents the inference server container from running correctly (see issue 141255952). Use a GKE 1.13 or earlier version or a GKE 1.14.6 or later version to avoid this issue.

Client Libraries and Examples

Ubuntu 16.04 and Ubuntu 18.04 builds of the client libraries and examples are included in this release in the attached v1.11.0_ubuntu1604.clients.tar.gz and v1.11.0_ubuntu1804.clients.tar.gz files. See the documentation section 'Building the Client Libraries and Examples' for more information on using these files. The client SDK is also available as a NGC Container.

Custom Backend SDK

Ubuntu 16.04 and Ubuntu 18.04 builds of the custom backend SDK are included in this release in the attached v1.11.0_ubuntu1604.custombackend.tar.gz and v1.11.0_ubuntu1804.custombackend.tar.gz files. See the documentation section 'Building a Custom Backend' for more information on using these files.

triton-inference-server/server v1.11.0 Release 1.11.0 corresponding to NGC container 20.02 on GitHub

NVIDIA TensorRT Inference Server

What's New In 1.11.0

Known Issues

Client Libraries and Examples

Custom Backend SDK

triton-inference-server/server v1.11.0
Release 1.11.0 corresponding to NGC container 20.02

on GitHub