
Speech Turing Tests, Smart Routing, Pseudocode Agents
New research reveals no speech AI passes a Turing test, adaptive routing slashes LLM costs 82%, and pseudocode planning transforms agent reliability.

New research reveals no speech AI passes a Turing test, adaptive routing slashes LLM costs 82%, and pseudocode planning transforms agent reliability.

Rankings of AI models on competition mathematics benchmarks including AIME 2025, IMO, MathArena, and FrontierMath, measuring the cutting edge of mathematical reasoning.

Zhipu AI's 744B open-source model GLM-5 was trained entirely on Huawei Ascend chips and now competes with GPT-5.2 and Claude Opus on major benchmarks.

Rankings of the best AI models and agent frameworks on agentic benchmarks measuring real-world task completion, web navigation, function calling, and multi-turn tool use.

Arrow 1, a purpose-built SVG generation model from a16z-backed QuiverAI, reached the top of SVG Arena with an Elo of 1583 one day after launch - shattering Gemini 3.1 Pro's previous record of 1421 by 162 points.

Google's Gemini 3.1 Pro more than doubles its predecessor's reasoning scores and introduces adjustable thinking modes, but latency issues and preview-status quirks keep it from a clean sweep.