AMD has introduced a new method for accelerating large language model (LLM) inference by enabling speculative speculative decoding (SSD) on its MI300X GPUs. SSD is an enhancement of speculative decoding (SD), which allows a smaller draft model to propose multiple tokens, verified in parallel by a larger target model. This approach reduces the sequential dependency between drafting and verification, allowing the draft model to precompute multiple speculative paths. The implementation, part of AMD's ROCm ecosystem, demonstrates how SSD can hide draft latency behind target-side verification, improving performance for latency-sensitive applications. The work highlights the importance of asynchronous multi-device scheduling and tree-style speculative decoding for future inference workloads. *Source: [amd](https://rocm.blogs.amd.com/artificial-intelligence/ssd_mi300x/README.html)*