NVIDIA Triton Inference Server

The NVIDIA Triton Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. For edge deployments, Triton Server is also available as a shared library with an API that allows the full functionality of the server to be included directly in an application.

What's New In 1.14.0

Support for the legacy V1 HTTP/REST, GRPC and corresponding client libraries
is released on GitHub branch r20.06-v1 and as NGC container
20.06-v1-py3.

Known Issues

TensorRT reformat-free I/O is not supported.
Some versions of Google Kubernetes Engine (GKE) contain a regression in the handling of LD_LIBRARY_PATH that prevents the inference server container from running correctly (see issue 141255952). Use a GKE 1.13 or earlier version or a GKE 1.14.6 or later version to avoid this issue.

Client Libraries and Examples

Ubuntu 18.04 builds of the client libraries and examples are included in this release in the attached v1.14.0_ubuntu1804.clients.tar.gz file. See the documentation section 'Building the Client Libraries and Examples' for more information on using these files. The client SDK is also available as a NGC Container.

Custom Backend SDK

Ubuntu 18.04 builds of the custom backend SDK are included in this release in the attached v1.14.0_ubuntu1804.custombackend.tar.gz file. See the documentation section 'Building a Custom Backend' for more information on using these files.

triton-inference-server/server v1.14.0 Release 1.14.0 corresponding to NGC container 20.06 on GitHub

NVIDIA Triton Inference Server

What's New In 1.14.0

Known Issues

Client Libraries and Examples

Custom Backend SDK

triton-inference-server/server v1.14.0
Release 1.14.0 corresponding to NGC container 20.06

on GitHub