Software

Amazon Introduces Nova Sonic Voice Agent Testing Framework

Amazon announced a new open-source testing framework for voice agents, enabling scalable evaluation without microphones.

Image: AWS Machine Learning

Amazon has introduced a new open-source testing framework for voice agents, designed to address the challenges of evaluating and iterating on voice-based customer service systems. The framework, called the Nova Sonic Test Harness, allows developers to run full multi-turn conversations automatically, evaluate results, and refine system prompts and tool configurations efficiently. This solution eliminates the need for manual testing, which is often slow and inconsistent, and enables teams to scale their evaluation processes significantly. The framework supports a range of evaluation criteria, including goal achievement, response accuracy, and tool usage, and is built to work seamlessly with Amazon Bedrock models and other AWS services. It also handles complex aspects like audio-text divergence and session reconnection, ensuring reliable testing even for long conversations. The framework is intended to streamline the development and deployment of voice agents, allowing teams to focus on improving user experience without the overhead of manual QA processes. The test harness is designed to be used in conjunction with an LLM judge, which evaluates the conversation based on predefined rubrics, ensuring unbiased and consistent results. It also supports the use of synthetic audio for testing the full speech-to-speech pipeline, providing a more realistic simulation of real-world interactions. The framework is part of Amazon's broader effort to improve the development and evaluation of voice agents, which are increasingly being used for tasks such as appointment booking, order inquiries, and account management. The system is designed to be flexible, allowing users to define test scenarios using JSON files, which can include details like user personas, tool configurations, and evaluation criteria. This approach enables developers to focus on setting goals and evaluation standards rather than expected outputs, which is crucial given the non-deterministic nature of voice agent responses. The framework also includes features like session continuation and history replay, which ensure that long conversations can be tested without interruption. Overall, the Nova Sonic Test Harness represents a significant step forward in the testing and evaluation of voice agents, offering a scalable and automated solution that reduces the time and effort required for quality assurance.

The Nova Sonic Test Harness is built to support a variety of testing scenarios, including multi-turn conversations with user personas, tool configurations, and evaluation criteria. It allows developers to define test scenarios in JSON files, which include details such as the user's goal, the system prompt, and the evaluation rubrics. The framework runs conversations automatically, using a user simulator, Nova Sonic, and an LLM judge to evaluate the results. This approach ensures that the evaluation is based on predefined criteria rather than exact string matches, which is essential given the non-deterministic nature of voice agent responses. The framework also supports the use of synthetic audio for testing the full speech-to-speech pipeline, providing a more realistic simulation of real-world interactions. The test harness is designed to handle complex aspects like audio-text divergence and session reconnection, ensuring reliable testing even for long conversations. It also includes a model registry that maps short aliases to full model IDs, ensuring that configurations remain consistent even when model versions change. The framework is intended to streamline the development and deployment of voice agents, allowing teams to focus on improving user experience without the overhead of manual QA processes.

The Nova Sonic Test Harness is part of Amazon's broader effort to improve the development and evaluation of voice agents, which are increasingly being used for tasks such as appointment booking, order inquiries, and account management. The system is designed to be flexible, allowing users to define test scenarios using JSON files, which can include details like user personas, tool configurations, and evaluation criteria. This approach enables developers to focus on setting goals and evaluation standards rather than expected outputs, which is crucial given the non-determinitative nature of voice agent responses. The framework also includes features like session continuation and history replay, which ensure that long conversations can be tested without interruption. Overall, the Nova Sonic Test Harness represents a significant step forward in the testing and evaluation of voice agents, offering a scalable and automated solution that reduces the time and effort required for quality assurance.

Source: awsml

Key points

Amazon introduced a new open-source testing framework for voice agents called the Nova Sonic Test Harness.
The framework enables scalable evaluation of voice agents without requiring microphones.
The test harness runs complete multi-turn conversations with Nova Sonic automatically.
It evaluates results using LLM-as-judge techniques and can detect audio hallucinations.
The framework handles bidirectional streaming, non-deterministic responses, and audio-text divergence.
Test scenarios are defined using JSON files with user personas, tool configurations, and evaluation criteria.
The system includes a model registry to map short aliases to full Amazon Bedrock model IDs.

Source: AWS Machine Learning Read the original →

WRITTEN BY

Theo Almeida

AI Software & Developer Tools

Theo covers AI software, developer tools, frameworks, and the platforms builders use every day.

Amazon Introduces Nova Sonic Voice Agent Testing Framework

Key points

Related articles

Grok Add-On Now Available in Google Workspace

Bluesky's Attie AI Expands Into Open Social Research Tool

AWS Launches Explainable NBP Recommender for Banking

Grok Build Now Supports Workflow Execution