Software

AMD Introduces AgentKernelArena for AI Coding Agent Benchmarking

AMD launches AgentKernelArena, an open-source benchmarking tool, to evaluate AI coding agents on AMD Instinct GPUs, with GEAKv3 leading performance in multiple categories.

Image: AMD

AMD has introduced AgentKernelArena, an open-source benchmarking framework designed to evaluate AI coding agents on GPU kernel optimization tasks. The tool, built by AMD, allows developers to test agents such as Cursor Agent, Claude Code, and OpenAI Codex on real-world GPU optimization challenges. It measures performance through a unified automated scoring system that evaluates compilation, correctness, and speedup over a baseline. The framework supports a range of agents and models, enabling fair and reproducible comparisons across different configurations. Source: amd

AgentKernelArena includes 214 tasks across four categories, covering kernel optimization, translation, and repository-scale work. The tool evaluates agents on a 44-task subset of the MI300X GPU, with GEAKv3, AMD’s in-house agent, achieving average speedups of 9.04× on HIP2HIP tasks, 2.75× on Triton2Triton tasks, and 1.20× on repository-scale tasks. Other agents, such as Claude Code and Cursor, also performed well, with speedups of 6.08× and 5.03× respectively on HIP2HIP tasks. Source: amd

The framework is designed to ensure fair and reproducible benchmarking by isolating agents in controlled environments and using a centralized evaluator for consistent scoring. It supports timestamped runs and resumable evaluations, allowing users to track changes in agent performance over time. AKA also enables A/B testing of different agent configurations, tools, and prompt variations. Source: amd

Key points

AMD launched AgentKernelArena, an open-source benchmarking tool for AI coding agents on AMD Instinct GPUs.
AgentKernelArena measures AI coding agents' performance through a unified automated scoring system that evaluates compilation, correctness, and speedup over a baseline.
GEAKv3, AMD’s in-house GPU kernel optimization agent, achieved average speedups of 9.04× on HIP2HIP tasks, 2.75× on Triton2Triton tasks, and 1.20× on repository-scale tasks.
AgentKernelArena includes 214 tasks across four categories, covering kernel optimization, translation, and repository-scale work.
The framework ensures fair and reproducible benchmarking by isolating agents in controlled environments and using a centralized evaluator for consistent scoring.
AMD’s AgentKernelArena supports timestamped runs and resumable evaluations, allowing users to track changes in agent performance over time.

Source: AMD Read the original →

WRITTEN BY

Theo Almeida

AI Software & Developer Tools

Theo covers AI software, developer tools, frameworks, and the platforms builders use every day.

AMD Introduces AgentKernelArena for AI Coding Agent Benchmarking

Key points

Related articles

Meta Launches Pocket, AI-Powered Gaming App

Amazon SageMaker AI Introduces Multi-Turn Reinforcement Learning

Mistral Launches Vibe Agent for Work and Code

Amazon SageMaker AI Accelerates Protein Design with BoltzGen