
Chatbot Arena Elo Rankings: Who Wins the Human Vote?
Updated July 2026 Chatbot Arena Elo rankings from Arena.ai: 7M+ votes across 368 models, Claude Opus 4.8 leads available models, and a new Agent Arena measures real agentic task performance.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Updated July 2026 Chatbot Arena Elo rankings from Arena.ai: 7M+ votes across 368 models, Claude Opus 4.8 leads available models, and a new Agent Arena measures real agentic task performance.

HappyHorse-1.0 from Alibaba-ATH leads the Artificial Analysis blind-vote rankings at Elo 1,290, but Seedance 2.0 is now globally available via fal.ai and still tops the with-audio leaderboard at 1,218.

June 2026 overall LLM rankings covering Claude Fable 5, Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, and the open-weight models catching up fast.

Current rankings of the best AI image generation models, including GPT Image 2, Nano Banana 2, Recraft V4.1, HiDream-O1-Image, FLUX 2, Midjourney v8.1, and Ideogram 3.0, scored on human preference, text rendering, and photorealism.

Claude Opus 4.6 leads MRCR v2 8-needle at 78% across 1M tokens while Opus 4.7 regressed sharply - GPT-5.5 and DeepSeek V4 Pro are the key new entrants in May 2026.

Rankings of the best AI models and agent frameworks on the GAIA benchmark, which tests real-world multi-step tasks requiring web browsing, tool use, and multi-hop reasoning.

Rankings of AI models by cost efficiency in May 2026, comparing performance per dollar across frontier and budget models. Updated with DeepSeek V4, GPT-5.5, and Kimi K2.6.

April 2026 rankings of the top embedding models by MTEB score - Gemini Embedding 001, NV-Embed-v2, Qwen3-Embedding-8B, and the new Jina v4 multimodal release compared for RAG and search.

Gemini 3.1 Pro leads GPQA Diamond at 94.1% and HLE at 44.7% as AIME 2025 saturates; Claude Opus 4.7 and Kimi K2.6 join the top tier in April 2026.

Rankings of LLMs and dedicated MT systems across FLORES-200, WMT24/25, TICO-19, and MT-GenEval benchmarks with BLEU, COMET, and human evaluation scores.

Rankings of the best audio language models on MMAU, MMAU-Pro, and other benchmarks covering speech reasoning, music understanding, and environmental sound identification.

A data-driven comparison of the top AI-powered video editing tools in 2026, covering auto-captions, clip generation, dubbing, silence removal, and pricing across 15 tools.