AMD has introduced the Ryzen AI Max+ processor, which features a unified memory architecture (UMA) that enables local large language model (LLM) inference with models exceeding 100 billion parameters. The Ryzen AI Max+ processor, paired with the Radeon 8060S integrated graphics, provides up to 128 GB of shared memory between the CPU and GPU, allowing models such as Qwen3.5 with 122B parameters to run on a single system without the need for multiple GPUs or cloud-based solutions. This setup supports a range of model sizes, from 9B to 122B parameters, with varying degrees of GPU utilization. The 9B and 35B models run with 100% GPU offloading, while the 122B model uses a CPU/GPU mixed loading configuration when it exceeds the 64 GB GPU-accessible memory limit. The Ryzen AI Max+ system requires Ubuntu 24.04 LTS, AMD ROCm 7.2.1, and Ollama 0.20.x for optimal performance. The unified memory architecture allows the GPU and CPU to share the same physical memory pool, providing flexibility in how memory is allocated for different tasks. *Source: [amd](https://rocm.blogs.amd.com/artificial-intelligence/ryzen-uma-llm/README.html)*