NVIDIA TensorRT Inference Server

The NVIDIA TensorRT Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New In 1.3.0

The ONNX Runtime (github.com/Microsoft/onnxruntime) is now integrated into inference server. ONNX models can now be used directly in a model repository.
HTTP health port may be specified independently of inference and status HTTP port with --http-health-port flag.
Fixed bug in perf_client that caused high CPU usage that could lower the measured inference/sec in some cases.

Known Issues

Google Cloud Storage (GCS) support is not available in the 19.06 release. Support for GCS is available on the master branch and will be re-enabled in the 19.07 release.

Client Libraries and Examples

Ubuntu 16.04 and Ubuntu 18.04 builds of the client libraries and examples are included in this release in the attached v1.3.0_ubuntu1604.clients.tar.gz and v1.3.0_ubuntu1804.clients.tar.gz files. See the documentation section 'Building the Client Libraries and Examples' for more information on using these files.

triton-inference-server/server v1.3.0 Release 1.3.0, corresponding to NGC container 19.06 on GitHub

NVIDIA TensorRT Inference Server

What's New In 1.3.0

Known Issues

Client Libraries and Examples

triton-inference-server/server v1.3.0
Release 1.3.0, corresponding to NGC container 19.06

on GitHub