Capabilities Articles

Best AI for Document Understanding - July 2026

Qwen3-VL-235B-A22B and Qwen2.5-VL-72B lead DocVQA above 96%, but the bigger story in July 2026 is that frontier labs have quietly stopped publishing comparable scores for their newest models.

Best AI for Data Analysis - July 2026

MiniMax M3 leads LiveSQLBench among general-purpose models at 40.17%, but purpose-built enterprise agent pipelines from C3 AI and Ant Group now beat every off-the-shelf LLM outright on raw SQL accuracy.

Best AI for Creative Writing - July 2026

Claude Fable 5 tops EQ-Bench Longform at Elo 2189 while GPT-5.5 leads the Mazur Writing Benchmark, reshaping the creative writing model rankings in July 2026.

Best AI for Web Browsing and Computer Use - July 2026

Claude Fable 5 leads OSWorld-Verified at 85% after its 19-day US suspension ended July 1 - Holo3 open-source at 82.6% and Claude Sonnet 5 at $2/M tokens reshape the value calculus.

Best AI Models for Video Generation - June 2026

HappyHorse-1.0 from Alibaba-ATH leads the Artificial Analysis blind-vote rankings at Elo 1,290, but Seedance 2.0 is now globally available via fal.ai and still tops the with-audio leaderboard at 1,218.

Best AI Models for Voice and Speech - June 2026

ElevenLabs Scribe v2 leads ASR at 2.2% WER after a price cut to $3.67/1000 min, Microsoft MAI-Transcribe-1.5 debuted at #3, and Gemini 3.1 Flash TTS now tops the naturalness leaderboard.

Best AI Models for RAG - June 2026

Gemini 2.5 Flash still leads LIT-RAGBench English RAG accuracy at 87.0%, but the full benchmark data reveals two overlooked entries: GPT-4.1-mini at 84.1% and o4-mini at 83.9%.

Best AI Models for Text Summarization - June 2026

Gemini 2.5 Flash Lite still leads the Vectara hallucination leaderboard at 3.3%, while two new entries - Gemini 3.5 Flash and Mistral Large 3 at $0.50/M - shift the value picture considerably since March.

Best Models for Long-Context Retrieval - May 2026

Claude Opus 4.6 leads MRCR v2 8-needle at 78% across 1M tokens while Opus 4.7 regressed sharply - GPT-5.5 and DeepSeek V4 Pro are the key new entrants in May 2026.

Best AI Models for Language Translation - May 2026

Gemini 3.1 Pro leads verified 2026 benchmarks at $2 per million tokens while GPT-5.5 and Claude Opus 4.7 postdate available translation evaluations - rankings, scores, and pricing for 10 models.

Best AI Models for Math Reasoning - April 2026

Gemini 3.1 Pro leads GPQA Diamond at 94.1% and HLE at 44.7% as AIME 2025 saturates; Claude Opus 4.7 and Kimi K2.6 join the top tier in April 2026.

Best AI Models for Image Generation - April 2026

GPT Image 1.5 leads Artificial Analysis at 1278 Elo while Nano Banana 2 tops Arena.ai - two leaderboards, two answers, and five new models that reshaped the rankings since March.