NVIDIA TensorRT Inference Server
The NVIDIA TensorRT Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.
What's New In 1.7.0
-
A Client SDK container is now provided on NGC in addition to the inference server container. The client SDK container includes the client libraries and examples.
-
TensorRT optimization may now be enabled for any TensorFlow model by enabling the feature in the optimization section of the model configuration.
-
The ONNXRuntime backend now includes the TensorRT and Open Vino execution providers. These providers are enabled in the optimization section of the model configuration.
-
Automatic configuration generation (
--strict-model-config=false
) now works correctly for TensorRT models with variable-sized inputs and/or outputs. -
Multiple model repositories may now be specified on the command line. Optional command-line options can be used to explicitly load specific models from each repository.
-
Ensemble models are now pruned dynamically so that only models needed to calculate the requested outputs are executed.
-
The example clients now include a simple Go example that uses the GRPC API.
Known Issues
-
In TensorRT 6.0.1, reformat-free I/O is not supported.
-
Some versions of Google Kubernetes Engine (GKE) contain a regression in the handling of LD_LIBRARY_PATH that prevents the inference server container from running correctly (see issue 141255952). Use a GKE 1.13 or earlier version or a GKE 1.14.6 or later version to avoid this issue.
Client Libraries and Examples
Ubuntu 16.04 and Ubuntu 18.04 builds of the client libraries and examples are included in this release in the attached v1.6.0_ubuntu1604.clients.tar.gz and v1.6.0_ubuntu1804.clients.tar.gz files. See the documentation section 'Building the Client Libraries and Examples' for more information on using these files. The client SDK is also available as a NGC Container.
Custom Backend SDK
Ubuntu 16.04 and Ubuntu 18.04 builds of the custom backend SDK are included in this release in the attached v1.6.0_ubuntu1604.custombackend.tar.gz and v1.6.0_ubuntu1804.custombackend.tar.gz files. See the documentation section 'Building a Custom Backend' for more information on using these files.