NVIDIA TensorRT Inference Server

The NVIDIA TensorRT Inference Server (TRTIS) provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New In 0.10.0 Beta

Custom backend support. TRTIS allows individual models to be implemented with custom backends instead of by a deep-learning framework. With a custom backend a model can implement any logic desired, while still benefiting from the GPU support, concurrent execution, dynamic batching and other features provided by TRTIS.

triton-inference-server/server v0.10.0 Release 0.10.0 beta, corresponding to NGC container 19.01 on GitHub

NVIDIA TensorRT Inference Server

What's New In 0.10.0 Beta

triton-inference-server/server v0.10.0
Release 0.10.0 beta, corresponding to NGC container 19.01

on GitHub