
Best Open-Source TTS Models for Self-Hosting in 2026
The best open-source text-to-speech models in 2026 - covering Kokoro, Chatterbox, Fish Speech, Dia, Voxtral, and Piper with real hardware requirements and licensing details.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

The best open-source text-to-speech models in 2026 - covering Kokoro, Chatterbox, Fish Speech, Dia, Voxtral, and Piper with real hardware requirements and licensing details.

A hands-on comparison of the top AI podcast creation tools in 2026 - covering recording, editing, voice cloning, and publishing for every budget.

We tested five AI voice agent platforms - ElevenLabs, Vapi, Retell AI, Bland AI, and Play.ai - comparing real latency, per-minute pricing, and which use cases each actually serves.

Alibaba's Qwen3.5-Omni takes text, images, audio, and video as input and streams both text and speech output in a single end-to-end model with a 256K context window.

Google's Gemini 3.1 Flash TTS ships in preview with 30 voices, 70-plus languages, 200-plus inline audio tags, and Elo 1,211 on the Artificial Analysis TTS Arena.

Normalized per-1M-character and per-hour TTS pricing across ElevenLabs, OpenAI, Google, Azure, Amazon Polly, Play.ht, Cartesia, Deepgram Aura, WellSaid, and more.

Google's new Gemini 3.1 Flash TTS hits Elo 1,211 on the Artificial Analysis leaderboard and introduces 200-plus audio tags for mid-sentence voice control, available in preview today via the Gemini API.

Mistral's first open-weights TTS model clones voices from 3 seconds of audio, beats ElevenLabs on price, and arrives with real limitations worth knowing.

Mistral releases Voxtral, a pair of open-weights models covering speech recognition and text-to-speech that undercut OpenAI and ElevenLabs on price.

ElevenLabs Scribe v2 leads speech-to-text at 2.3% WER while ElevenLabs Flash v2.5 sets the pace for TTS with 75ms latency - but Google and Mistral are closing in fast.

Rankings of the best text-to-speech and speech-to-text AI models by naturalness, accuracy, latency, and pricing.

A data-driven comparison of the top AI voice generators and TTS tools in 2026, covering ElevenLabs, Fish Audio, OpenAI TTS, LMNT, Cartesia, and open-source alternatives.