Research

AMD Enables Speculative Speculative Decoding on MI300X GPUs

AMD has enabled speculative speculative decoding (SSD) on MI300X GPUs, improving throughput-latency trade-offs for large language models.

Image: AMD

AMD has introduced a new method for accelerating large language model (LLM) inference by enabling speculative speculative decoding (SSD) on its MI300X GPUs. SSD is an enhancement of speculative decoding (SD), which allows a smaller draft model to propose multiple tokens, verified in parallel by a larger target model. This approach reduces the sequential dependency between drafting and verification, allowing the draft model to precompute multiple speculative paths.

The implementation, part of AMD's ROCm ecosystem, demonstrates how SSD can hide draft latency behind target-side verification, improving performance for latency-sensitive applications. The work highlights the importance of asynchronous multi-device scheduling and tree-style speculative decoding for future inference workloads.

Source: amd

Key points

AMD enabled speculative speculative decoding (SSD) on MI300X GPUs to improve throughput-latency trade-offs for large language models.
SSD removes the sequential dependency between drafting and verification by allowing the draft model to precompute multiple speculative paths.
The implementation uses five MI300X GPUs for the 70B benchmark configuration, with four for the target model and one for the draft model.
Two correctness bugs in the FlashInfer HIP path were identified and fixed to ensure proper SSD functionality.
The attention stack was adapted for ROCm-compatible backends, removing NVIDIA-specific dependencies.
A dual tree-decode backend was introduced to support both high-performance and correctness-oriented execution paths.
A setup_rocm.sh script was added to automate the environment bring-up for SSD reproduction on MI300X.

Source: AMD Read the original →

WRITTEN BY

Maya Chen

AI Research & Breakthroughs

Maya breaks down the latest AI research papers, benchmarks, and technical breakthroughs into plain language.

AMD Enables Speculative Speculative Decoding on MI300X GPUs

Key points

Related articles

Meta and Stanford Test AI with Baby-Like Learning

ELIZA Source Code Reveals Chatbot's Multiple Personalities

Hugging Face Evaluates Open-Source AI Models for Swiss Legal Tasks

Anthropic Discovers New Internal Space in AI Models