Mistral has introduced Voxtral TTS, a new open-source text-to-speech model designed for enterprise voice applications, positioning the company in direct competition with ElevenLabs, Deepgram, and OpenAI. The model supports nine languages, including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic, and is built for deployment across edge devices such as smartphones, laptops, and wearables.
Voxtral TTS enables rapid voice customization using minimal audio input while preserving accents, tone, and speech nuances, and can switch between languages without losing voice consistency.
Pierre Stock, Vice President of Science Operations at Mistral, indicated the model was developed in response to enterprise demand for efficient, high-performance speech systems. Built for real-time performance, the model delivers low latency and fast audio generation.
"We see audio as a big bet and as a critical and maybe the only future interface with all the AI models," Pierre Stock said. "This is something customers have been asking for."
The launch builds on Mistral’s broader strategy to develop a full multimodal AI platform spanning audio, text, and image processing.
These state‑of‑the‑art speech understanding models are available in two sizes—a 24B variant for production-scale applications and a 3B variant for local and edge deployments. Both versions are released under the Apache 2.0 license, and are also available on Mistral’s API.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.




