Amazon SageMaker JumpStart has launched NVIDIA's Nemotron 3 Ultra, an open large language model optimized for long-running autonomous agents. The model is available for deployment through a one-click experience, reducing the complexity of setting up infrastructure and serving frameworks. It is designed to enhance performance for tasks requiring sustained multi-step reasoning, such as agent orchestration and complex enterprise workflows. According to NVIDIA, the model delivers 5x faster inference and up to 30% lower cost for agentic workloads compared to previous models. The model uses the NVFP4 format, which improves both speed and cost-effectiveness for hosting. It is built on a hybrid Transformer-Mamba Mixture-of-Experts (MoE) architecture, allowing it to activate only 55 billion of its 550 billion parameters per forward pass, maintaining high throughput even at million-token context lengths. This makes it suitable for tasks that require extensive planning, tool calling, and self-correction across hundreds of turns. The model is optimized for use cases such as coding agents, deep research systems, and complex automation workflows. Deploying the model creates a SageMaker endpoint that incurs charges while running, and users are advised to delete it when no longer needed to avoid ongoing costs. The deployment process can be done through Amazon SageMaker Studio or the SageMaker Python SDK, with specific instance types and configuration settings required for optimal performance. Users must ensure they have an AWS account with appropriate permissions and sufficient GPU quotas before starting the deployment. The model is intended for developers and enterprises looking to integrate advanced reasoning capabilities into their workflows with reduced compute costs and improved efficiency. The release of Nemotron 3 Ultra marks a significant step in enabling scalable, cost-effective agentic workflows on AWS platforms.
NVIDIA's Nemotron 3 Ultra is an open large language model with 550 billion total parameters and 55 billion active parameters. It is built on a hybrid Transformer-Mamba Mixture-of-Experts (MoE) architecture, designed to deliver frontier intelligence at a fraction of the compute cost of dense models of equivalent quality. The model supports up to 1 million tokens in context length and uses the NVFP4 format, which enhances both speed and cost-effectiveness for hosting. The inference speed is 5x faster for long-running agent workflows, and the cost is up to 30% lower for complex agentic tasks. The model's MoE architecture activates only 55 billion of its 550 billion parameters per forward pass, maintaining high throughput even at million-token context lengths. This enables agents to sustain planning, tool calling, and self-correction loops across hundreds of turns while maintaining coherence and managing cost. The model is optimized for enterprise use cases that require sustained multi-step reasoning, including agent orchestrators, coding agents, deep research systems, and complex automation workflows. The model is available for deployment through Amazon SageMaker JumpStart with a one-click experience, eliminating the need to manage infrastructure or configure serving frameworks. Users must ensure they have an AWS account with appropriate permissions and sufficient GPU quotas before starting the deployment. The deployment process can be done through Amazon SageMaker Studio or the SageMaker Python SDK, with specific instance types and configuration settings required for optimal performance.
NVIDIA's Nemotron 3 Ultra is designed for agentic AI, which involves agents that plan, call tools, delegate work, and check results across hundreds of turns. These tasks require sustained multi-step reasoning, and the model addresses this by maintaining high throughput and coherence while managing cost. The model's hybrid Transformer-Mamba MoE architecture allows it to activate only 55 billion of its 550 billion parameters per forward pass, which is crucial for handling long context lengths and maintaining performance. The model is optimized for use cases such as agent orchestrators, coding agents, deep research systems, and complex enterprise workflows. It is available for deployment through Amazon SageMaker JumpStart with a one-click experience, reducing the complexity of setting up infrastructure and serving frameworks. The model's availability on AWS platforms enables developers and enterprises to integrate advanced reasoning capabilities into their workflows with reduced compute costs and improved efficiency.
Source: awsml