
Gemini Flash Live Edges GPT-4 Realtime in Voice AI Race
Google's Gemini 3.1 Flash Live beats GPT-4 Realtime 1.5 on Scale AI's Audio MultiChallenge and takes Search Live to 200+ countries - but it doesn't lead every benchmark.

Google's Gemini 3.1 Flash Live beats GPT-4 Realtime 1.5 on Scale AI's Audio MultiChallenge and takes Search Live to 200+ countries - but it doesn't lead every benchmark.

Cohere releases its first audio model - a 2B-parameter open-source ASR system beating Whisper Large v3 by 27% on the HuggingFace Open ASR Leaderboard.

Mistral's first open-weights TTS model clones voices from 3 seconds of audio, beats ElevenLabs on price, and arrives with real limitations worth knowing.

Mistral releases Voxtral, a pair of open-weights models covering speech recognition and text-to-speech that undercut OpenAI and ElevenLabs on price.

Tencent open-sources Covo-Audio, a 7B end-to-end audio language model with native full-duplex conversation that beats larger closed models on key benchmarks.

ElevenLabs Scribe v2 leads speech-to-text at 2.3% WER while ElevenLabs Flash v2.5 sets the pace for TTS with 75ms latency - but Google and Mistral are closing in fast.

A practical guide to building an AI voice agent using platforms like Vapi, Retell, and LiveKit - covering architecture, setup steps, and cost estimates.

Rankings of the best text-to-speech and speech-to-text AI models by naturalness, accuracy, latency, and pricing.

A developer ported NVIDIA's PersonaPlex 7B speech-to-speech model to native Swift using MLX, running full-duplex conversation on Apple Silicon with no cloud, no Python, and faster-than-real-time inference.

New research reveals no speech AI passes a Turing test, adaptive routing slashes LLM costs 82%, and pseudocode planning transforms agent reliability.

NotebookLM went viral for turning documents into AI podcasts, but the real story is whether Google has built a genuinely useful research tool or just a clever party trick. We spent a month finding out.

A data-driven comparison of the top AI voice generators and TTS tools in 2026, covering ElevenLabs, Fish Audio, OpenAI TTS, LMNT, Cartesia, and open-source alternatives.