Gemini 3.1 Flash TTS Covers 70 Languages With Tags

blog.google | ksl | Apr 17, 2026 |

Google released Gemini 3.1 Flash TTS, a text-to-speech model that supports over 70 languages and gives developers fine-grained delivery control through 200+ audio tags embedded directly in the input text. It handles multi-speaker dialogue natively, which opens up podcast generation and scripted conversational flows without stitching separate voice outputs together. The model tops the Artificial Analysis TTS leaderboard with an Elo score of 1,211, and all output carries SynthID watermarking for provenance tracking. ElevenLabs, OpenAI, and Amazon have all shipped competitive TTS models in recent months, but Google bundling this level of style control into its existing Gemini API and Vertex infrastructure gives it a distribution advantage that standalone voice startups will struggle to match.

Gemini 3.1 Flash TTS Covers 70 Languages With Tags

// 0 comments