DeepMind has released Gemini 3.5 Live Translate, an audio model that enables near real-time speech-to-speech translation across 70 languages. The model automatically detects languages and generates smooth, natural-sounding translations that preserve the speaker's intonation, pacing, and pitch. It is designed to provide fluid audio without awkward pauses, staying just a few seconds behind the speaker during the session. This advancement aims to improve communication across languages in real-time settings. Source: deepmind
Gemini 3.5 Live Translate is being rolled out starting today across Google products, including Google Meet for enterprises in private preview and Google Translate on Android and iOS for all users. The model processes speech as it is streamed, enabling seamless multilingual communication without manual configuration. It also features noise robustness, allowing it to function effectively in loud or unpredictable environments. Developers can integrate the model via the Gemini Live API, which supports platforms like Agora, Fishjam, LiveKit, Pipecat, and Vision Agents. These integrations handle real-time media streaming, allowing developers to focus on user experience. Source: deepmind
DeepMind's initiative builds on its 20-year history of machine learning experiments in translation, which have translated over a trillion words for billions of users monthly. The company emphasized that Gemini 3.5 Live Translate represents a next step in this evolution, offering improvements such as support for over 70 languages and enabling conversations across over 2000 language combinations in one meeting. Source: deepmind