Triton Inference Server

The Triton Inference Server provides a cloud inferencing solution optimized for both CPUs and GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. For edge deployments, Triton Server is also available as a shared library with an API that allows the full functionality of the server to be included directly in an application.

What's New In 2.4.0

A new Python backend allows Python code to run as a model within Triton. See https://github.com/triton-inference-server/python_backend.
A new DALI backend allows running pre-processing and augmentation pipelines within Triton. See https://github.com/triton-inference-server/dali_backend.
The perf_client application is renamed to perf_analyzer, functionality remains the same.
A new Model Analyzer project is started with the goal of providing analysis and guidance on how to best optimize single or multiple models within Triton. The initial release analyzes GPU memory usage. See https://github.com/triton-inference-server/model_analyzer.
Triton documentation now resides on GitHub and is reachable from https://github.com/triton-inference-server/server/blob/master/README.md.
Build process for Triton has changed, see https://github.com/triton-inference-server/server/blob/master/docs/build.md.
Triton backends are moving to separate repositories. In this release the TensorFlow, ONNX Runtime, Python and DALI backends are moved, see https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton.
Refer to the 20.10 column of the Frameworks Support Matrix
for container image versions that the 20.09 inference server container is based on.
Ubuntu 18.04 with September 2020 updates.

Known Issues

TensorRT reformat-free I/O is not supported.
Some versions of Google Kubernetes Engine (GKE) contain a regression in the handling of LD_LIBRARY_PATH that prevents the inference server container from running correctly (see issue 141255952). Use a GKE 1.13 or earlier version or a GKE 1.14.6 or later version to avoid this issue.

Client Libraries and Examples

Ubuntu 18.04 builds of the client libraries and examples are included in this release in the attached v2.4.0_ubuntu1804.clients.tar.gz file. See the documentation section 'Building the Client Libraries and Examples' for more information on using these files. The client SDK is also available as a NGC Container.

Jetson Jetpack Support

A release of Triton for the Developer Preview of JetPack 4.4 (https://developer.nvidia.com/embedded/jetpack) is provided in the attached file: v2.4.0-jetpack4.4-1718105.tgz. This release supports the TensorFlow 2.3.1, TensorFlow 1.15.4, TensorRT 7.1, and Custom backends as well as ensembles. GPU metrics, GCS storage and S3 storage are not supported.

The tar file contains the Triton server executable and shared libraries and also the C++ and Python client libraries and examples.

Installation and Usage

The following dependencies must be installed before running Triton.

apt-get update && \
    apt-get install -y --no-install-recommends \
        software-properties-common \
        autoconf \
        automake \
        build-essential \
        cmake \
        git \
        libb64-dev \
        libre2-dev \
        libssl-dev \
        libtool \
        libboost-dev \
        libcurl4-openssl-dev \
        rapidjson-dev \
        patchelf \
        zlib1g-dev

To run the clients the following dependencies must be installed.

apt-get install -y --no-install-recommends \
        curl \
        libopencv-dev=3.2.0+dfsg-4ubuntu0.1 \
        libopencv-core-dev=3.2.0+dfsg-4ubuntu0.1 \
        pkg-config \
        python3 \
        python3-pip \
        python3-dev

python3 -m pip install --upgrade wheel setuptools
python3 -m pip install --upgrade grpcio-tools numpy pillow

The Python wheel for the python client library is present in the tar file and can be installed by running the following command:

python3 -m pip install --upgrade clients/python/tritonclient-2.4.0-py3-none-linux_aarch64.whl[all]

On jetson, the backend directory needs to be explicitly set with the --backend-directory flag. Triton also defaults to using TensorFlow 1.x and a version string is required to specify TensorFlow 2.x.

  tritonserver --model-repository=/path/to/model_repo --backend-directory=/path/to/tritonserver/backends \
         --backend-config=tensorflow,version=2

triton-inference-server/server v2.4.0 Release 2.4.0 corresponding to NGC container 20.10 on GitHub