Software

Loka Uses Amazon Nova 2 Sonic for Low-Latency Voice Agent

Loka's AWS-based voice agent using Amazon Nova 2 Sonic achieved 87.0 speech reasoning score on Big Bench Audio, outperforming competitors.

Loka developed a conversational AI agent using Amazon Nova 2 Sonic, improving customer engagement through natural, responsive voice interactions. The solution delivers faster response times and lower costs than traditional voice AI pipelines. AWS-based systems achieved high speech reasoning accuracy on Big Bench Audio while reducing latency and expenses.

Traditional voice assistants face delays due to a three-step process involving speech-to-text, language model processing, and text-to-speech conversion. This pipeline often introduces 3 to 5 second pauses, disrupting natural conversation flow. A real-world example at an automotive dealership showed how traditional systems struggle with complex queries, losing crucial information during speech-to-text conversion. The delays and misunderstandings negatively impact customer experience and increase support costs.

Native speech-to-speech models, like Amazon Nova 2 Sonic, process audio end-to-end, capturing tone, emotion, and subtle cues missed by traditional text-only pipelines. Rigorous testing confirmed the model's effectiveness, achieving a speech reasoning score of 87.0 on Big Bench Audio. This outperformed Gemini 2.5 Flash Native Audio (Live API) at 71.0 and exceeded GPT Realtime’s 83.0. The model also achieved a Time to First Audio of 1.39 seconds, enabling natural 'barge-in' behavior during conversations.

Source: awsml

Key points

Loka's AWS-based voice agent using Amazon Nova 2 Sonic achieved 87.0 speech reasoning score on Big Bench Audio.
Amazon Nova 2 Sonic outperformed Gemini 2.5 Flash Native Audio (Live API) at 71.0 and exceeded GPT Realtime’s 83.0.
Nova 2 Sonic achieved a Time to First Audio of 1.39 seconds, enabling natural 'barge-in' behavior during conversations.
Nova 2 Sonic costs approximately $0.27 per hour of the input audio, lower than comparable real-time models and traditional methods.
Response Appropriateness improved from 2.5 to 2.9, Intent Understanding rose from 2.9 to 3.0, and Completeness jumped from 1.8 to 2.5.
Conversational Naturalness improved from 2.5 to 2.8, with the overall judge score increasing from 2.4 to 2.7.

Source: AWS Machine Learning Read the original →

WRITTEN BY

Theo Almeida

AI Software & Developer Tools

Theo covers AI software, developer tools, frameworks, and the platforms builders use every day.

Loka Uses Amazon Nova 2 Sonic for Low-Latency Voice Agent

Key points

Related articles

Snowflake and Amazon Quick Sight Enable Unified AI-Powered BI

Huntington Bank Uses AWS to Redact Sensitive Data from 400M+ Documents

AMD Introduces ATOM for DeepSeek-V4 Inference on MI355X

Mistral AI Introduces Enhanced Connector Controls