Software

Microsoft Unveils ASSERT Tool for AI Behavior Testing

Microsoft released ASSERT, an open-source framework, to help developers test AI systems for application-specific behaviors using natural language descriptions.

Image: TechCrunch

Microsoft has introduced ASSERT, an open-source framework designed to simplify the testing of AI systems for application-specific behaviors. The tool enables developers to create thorough, scored tests by converting high-level, natural-language descriptions of goals or policies into structured evaluations. ASSERT generates problem scenarios and test cases, runs them against the target system, and scores the results. It also records the paths the AI system takes, including intermediate actions and tool calls, allowing developers to inspect where failures occur. Developers can provide system context, tools, and constraints to further customize the evaluations. For instance, a developer could specify that a document research AI agent shouldn’t send emails to people outside the company, and it should limit confidential information to C-level executives and provide concise summaries with prior context in mind. ASSERT uses these rules to generate test cases that check whether the system follows those rules on an ongoing basis. According to Microsoft, ASSERT fills a gap that broader, more general evaluations cannot when AI models are intended to behave in a manner shaped by an application or product’s context, policies, and tools. Sarah Bird, chief product officer of Responsible AI at Microsoft, emphasized that evaluations are critical for making good decisions. She noted that without understanding the behavior of an AI system, it’s difficult to know if it meets an organization’s standards. Bird also highlighted that ASSERT can be used to evaluate systems during development, after deployment, and for continuous monitoring. This release comes amid a broader industry shift toward repeatable testing and regression checks as models become more capable. Researchers are focusing on benchmarks to measure how models behave under different conditions, with tools like Stanford’s HELM, MLCommons’ AILuminate, and evaluation groups like METR contributing to this effort.

Source: techcrunch

Key points

Microsoft released ASSERT, an open-source framework for testing AI systems for application-specific behaviors.
ASSERT converts natural-language descriptions of AI goals or policies into structured evaluations and test cases.
The tool records AI system paths, including intermediate actions and tool calls, for failure inspection.
Developers can customize evaluations by providing system context, tools, and constraints.
ASSERT checks if AI systems follow specified rules on an ongoing basis.
Microsoft claims ASSERT fills a gap that broader, more general evaluations cannot address.
ASSERT can be used to evaluate systems during development, after deployment, and for continuous monitoring.

Source: TechCrunch Read the original →

WRITTEN BY

Theo Almeida

AI Software & Developer Tools

Theo covers AI software, developer tools, frameworks, and the platforms builders use every day.

Microsoft Unveils ASSERT Tool for AI Behavior Testing

Key points

Related articles

Smartsheet Deploys Remote MCP Server on AWS

Amazon Introduces Mobile Layout for Quick Dashboards

Linus Torvalds Supports AI in Linux Kernel Development

Amazon Introduces Bedrock Managed Knowledge Base for Enterprise Search