Software

AMD Launches Triton Inference Server with ONNX Runtime Backend

AMD released Triton Inference Server with ONNX Runtime backend for AMD GPUs, supporting dynamic batching and MIGraphX optimization.

Image: AMD

AMD has released Triton Inference Server with ONNX Runtime backend, enabling developers to deploy and optimize AI models on AMD GPUs. The platform supports dynamic batching, which groups multiple inference requests to improve GPU utilization and throughput. This capability is crucial for real-time applications requiring minimal latency.

The release includes integration with MIGraphX, AMD's graph inference engine, which compiles and optimizes model graphs for high-performance execution on AMD Instinct GPUs. Developers can configure the model repository and write configuration files to define backend, input/output tensors, and optimization settings. The release is part of a series of blogs detailing AI model deployment on AMD GPUs.

Source: amd

Key points

AMD released Triton Inference Server with ONNX Runtime backend for AMD GPUs.
Triton Inference Server supports dynamic batching to improve GPU utilization and throughput.
MIGraphX compiles and optimizes model graphs for high-performance execution on AMD Instinct GPUs.
Developers can configure the model repository and write configuration files for Triton Inference Server.
The release is part of a series of blogs detailing AI model deployment on AMD GPUs.

Source: AMD Read the original →

WRITTEN BY

Theo Almeida

AI Software & Developer Tools

Theo covers AI software, developer tools, frameworks, and the platforms builders use every day.

AMD Launches Triton Inference Server with ONNX Runtime Backend

Key points

Related articles

SpaceXAI Open-Sources Grok Build Coding Agent

Amazon Introduces Cross-Account SageMaker Pipeline Monitoring

Built Technologies Uses AWS to Power AI Document Intelligence in Real Estate

Spotify Launches AI Voice Interface for Premium Users