NVIDIA TensorRT Inference Server

The NVIDIA TensorRT Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New In 0.11.0 Beta

Variable-size input and output tensor support. Models that support variable-size input tensors and produce variable-size output tensors are now supported in the model configuration by using a dimension size of -1 for those dimensions that can take on any size.
String datatype support. For TensorFlow models and custom backends, input and output tensors can contain strings.
Improved support for non-GPU systems. The inference server will run correctly on systems that do not contain GPUs and that do not have nvidia-docker or CUDA installed.

Client Libraries and Examples

An Ubuntu 16.04 build of the client libraries and examples are included in this release in the attached v0.11.0.clients.tar.gz. See the documentation section 'Building the Client Libraries and Examples' for more information on using this file.

triton-inference-server/server v0.11.0 Release 0.11.0 beta, corresponding to NGC container 19.02 on GitHub

NVIDIA TensorRT Inference Server

What's New In 0.11.0 Beta

Client Libraries and Examples

triton-inference-server/server v0.11.0
Release 0.11.0 beta, corresponding to NGC container 19.02

on GitHub