Model Release

NVIDIA Nemotron 3 Ultra Now Available on Amazon SageMaker JumpStart

NVIDIA's Nemotron 3 Ultra, with 550 billion parameters, is now available on Amazon SageMaker JumpStart, offering 5x faster inference and up to 30% lower cost for agentic workloads.

Image: AWS Machine Learning

Amazon SageMaker JumpStart has launched NVIDIA's Nemotron 3 Ultra, an open large language model optimized for long-running autonomous agents. The model is available for deployment through a one-click experience, reducing the complexity of setting up infrastructure and serving frameworks. It is designed to enhance performance for tasks requiring sustained multi-step reasoning, such as agent orchestration and complex enterprise workflows. According to NVIDIA, the model delivers 5x faster inference and up to 30% lower cost for agentic workloads compared to previous models. The model uses the NVFP4 format, which improves both speed and cost-effectiveness for hosting. It is built on a hybrid Transformer-Mamba Mixture-of-Experts (MoE) architecture, allowing it to activate only 55 billion of its 550 billion parameters per forward pass, maintaining high throughput even at million-token context lengths. This makes it suitable for tasks that require extensive planning, tool calling, and self-correction across hundreds of turns. The model is optimized for use cases such as coding agents, deep research systems, and complex automation workflows. Deploying the model creates a SageMaker endpoint that incurs charges while running, and users are advised to delete it when no longer needed to avoid ongoing costs. The deployment process can be done through Amazon SageMaker Studio or the SageMaker Python SDK, with specific instance types and configuration settings required for optimal performance. Users must ensure they have an AWS account with appropriate permissions and sufficient GPU quotas before starting the deployment. The model is intended for developers and enterprises looking to integrate advanced reasoning capabilities into their workflows with reduced compute costs and improved efficiency. The release of Nemotron 3 Ultra marks a significant step in enabling scalable, cost-effective agentic workflows on AWS platforms.

NVIDIA's Nemotron 3 Ultra is an open large language model with 550 billion total parameters and 55 billion active parameters. It is built on a hybrid Transformer-Mamba Mixture-of-Experts (MoE) architecture, designed to deliver frontier intelligence at a fraction of the compute cost of dense models of equivalent quality. The model supports up to 1 million tokens in context length and uses the NVFP4 format, which enhances both speed and cost-effectiveness for hosting. The inference speed is 5x faster for long-running agent workflows, and the cost is up to 30% lower for complex agentic tasks. The model's MoE architecture activates only 55 billion of its 550 billion parameters per forward pass, maintaining high throughput even at million-token context lengths. This enables agents to sustain planning, tool calling, and self-correction loops across hundreds of turns while maintaining coherence and managing cost. The model is optimized for enterprise use cases that require sustained multi-step reasoning, including agent orchestrators, coding agents, deep research systems, and complex automation workflows. The model is available for deployment through Amazon SageMaker JumpStart with a one-click experience, eliminating the need to manage infrastructure or configure serving frameworks. Users must ensure they have an AWS account with appropriate permissions and sufficient GPU quotas before starting the deployment. The deployment process can be done through Amazon SageMaker Studio or the SageMaker Python SDK, with specific instance types and configuration settings required for optimal performance.

NVIDIA's Nemotron 3 Ultra is designed for agentic AI, which involves agents that plan, call tools, delegate work, and check results across hundreds of turns. These tasks require sustained multi-step reasoning, and the model addresses this by maintaining high throughput and coherence while managing cost. The model's hybrid Transformer-Mamba MoE architecture allows it to activate only 55 billion of its 550 billion parameters per forward pass, which is crucial for handling long context lengths and maintaining performance. The model is optimized for use cases such as agent orchestrators, coding agents, deep research systems, and complex enterprise workflows. It is available for deployment through Amazon SageMaker JumpStart with a one-click experience, reducing the complexity of setting up infrastructure and serving frameworks. The model's availability on AWS platforms enables developers and enterprises to integrate advanced reasoning capabilities into their workflows with reduced compute costs and improved efficiency.

Source: awsml

Key points

NVIDIA's Nemotron 3 Ultra is now available on Amazon SageMaker JumpStart.
Nemotron 3 Ultra delivers 5x faster inference and up to 30% lower cost for agentic workloads.
Nemotron 3 Ultra has 550 billion total parameters and 55 billion active parameters.
The model uses a hybrid Transformer-Mamba MoE architecture for efficient computation.
Nemotron 3 Ultra supports up to 1 million tokens in context length.
The model is optimized for long-running autonomous agents and complex enterprise workflows.
Deployment requires specific GPU instance types and user permissions.

Source: AWS Machine Learning Read the original →

WRITTEN BY

Alex Lindgren

LLMs & Frontier Models

Alex covers the large language models and their impact on society.

NVIDIA Nemotron 3 Ultra Now Available on Amazon SageMaker JumpStart

Key points

Related articles

Google Deepmind's GenCeption Uses Video Generators for Computer Vision Tasks

Alibaba's Qwen 3.8 Competes With Kimi K3, Claims Second to Fable 5

Aether-7B-5Attn: Korean Startup Releases Fully Open Foundation Model

Moonshot AI Launches Kimi K3, Open Source AI Model