NVIDIA TensorRT Inference Server

The NVIDIA TensorRT Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New In 1.1.0

Client libraries and examples now build with a separate Makefile (a Dockerfile is also included for convenience).
Input or output tensors with variable-size dimensions (indicated by -1 in the model configuration) can now represent tensors where the variable dimension has value 0 (zero).
Zero-sized input and output tensors are now supported for batching models. This enables the inference server to support models that require inputs and outputs that have shape [ batch-size ].
TensorFlow custom operations (C++) can now be built into the inference server. An example and documentation are included in this release.

Client Libraries and Examples

An Ubuntu 16.04 build of the client libraries and examples are included in this release in the attached v1.1.0.clients.tar.gz. See the documentation section 'Building the Client Libraries and Examples' for more information on using this file.

triton-inference-server/server v1.1.0 Release 1.1.0, corresponding to NGC container 19.04 on GitHub

NVIDIA TensorRT Inference Server

What's New In 1.1.0

Client Libraries and Examples

triton-inference-server/server v1.1.0
Release 1.1.0, corresponding to NGC container 19.04

on GitHub