Mistral AI announced the release of Voxtral TTS, its first text-to-speech model designed for multilingual voice generation. The model operates with 4B parameters, making it lightweight and cost-effective at scale. Voxtral TTS produces realistic, emotionally expressive speech in nine popular languages, including support for diverse dialects. It features very low latency for time-to-first-audio, allowing for quick response times. The model is easily adaptable to new voices, enabling enterprises to customize their voice AI stacks. Available for testing in Mistral Studio, Voxtral TTS is positioned as enterprise-grade text-to-speech, supporting critical voice agent workflows. According to Mistral, the model excels in contextual understanding and speaker modeling, capturing how a specific person naturally speaks. The voice adaptation goes beyond traditional read-speech by incorporating a speaker's personality, including natural pauses, rhythm, intonation, and emotional dexterity. *Source: [mistral](https://mistral.ai/news/voxtral-tts/)*