Articles Tagged "Voice AI"

GPT-Realtime-2

OpenAI's second-generation real-time audio model with GPT-5-class reasoning, 128K context, five reasoning levels, and parallel tool calling - now generally available in the Realtime API.

OpenAI's Realtime API Goes GA with Three New Models

OpenAI's Realtime API exits beta with GPT-Realtime-2, Translate, and Whisper - three specialized voice models splitting reasoning, translation, and transcription into distinct endpoints.

Apple Opens iOS 27 to Claude, Gemini, ChatGPT

Apple's iOS 27 'Extensions' feature lets users swap Claude, Gemini, or ChatGPT into Siri, Writing Tools, and Image Playground - the first time rival AI models can power Apple Intelligence natively.

OpenAI Rebuilt Its Voice AI Stack for 900M Users

OpenAI published how they rearchitected their WebRTC stack to serve 900M weekly voice users on Kubernetes using a split relay and transceiver model.

Gemini in 4M Cars - GM Bets the Dashboard on Google

Gemini arrives in 4 million GM vehicles and 16 Volvo models via OTA update as GM phases out Apple CarPlay by 2028. The in-car AI platform war is now a real fight.

Best AI Podcast Creation Tools 2026

A hands-on comparison of the top AI podcast creation tools in 2026 - covering recording, editing, voice cloning, and publishing for every budget.

Best AI Language Learning Tools in 2026

Six AI language learning tools tested and compared by price, language coverage, speaking practice quality, and who each one actually suits.

Best AI Audio Editing Tools in 2026

A hands-on comparison of the top AI audio editing tools in 2026, covering noise removal, stem separation, mastering, and podcast production.

Best AI Phone Call Agents in 2026 - 5 Platforms

Hands-on comparison of Bland AI, Retell AI, Air AI, Vapi.ai, and Cal.com AI - five platforms for automated phone calls with verified pricing, latency numbers, and honest shortcomings.

Best AI Voice Agents in 2026 - 5 Platforms Tested

We tested five AI voice agent platforms - ElevenLabs, Vapi, Retell AI, Bland AI, and Play.ai - comparing real latency, per-minute pricing, and which use cases each actually serves.

Qwen3.5-Omni

Alibaba's Qwen3.5-Omni takes text, images, audio, and video as input and streams both text and speech output in a single end-to-end model with a 256K context window.

Gemini 3.1 Flash TTS

Google's Gemini 3.1 Flash TTS ships in preview with 30 voices, 70-plus languages, 200-plus inline audio tags, and Elo 1,211 on the Artificial Analysis TTS Arena.

← Previous