Articles Tagged "Leaderboards"

Scientific Reasoning LLM Leaderboard 2026: GPQA Ranks

Rankings of AI models on STEM benchmarks: GPQA Diamond, SciBench, OlympiadBench-Science, MMLU-STEM, ARC-Challenge, and ChemQA/Physics Olympiad as of April 2026.

Structured Output JSON Schema Leaderboard 2026

Rankings of LLMs and constrained decoding frameworks on JSON schema adherence benchmarks including JSONSchemaBench and BFCL v3, covering native APIs and open-source constraint engines.

Summarization LLM Leaderboard 2026: ROUGE and Faithfulness

Rankings of the top LLMs on summarization benchmarks - ROUGE-L, BERTScore, FActScore, and human preference across CNN/DailyMail, XSum, GovReport, QMSum, and BookSum as of April 2026.

Text-to-SQL LLM Leaderboard 2026: Spider and BIRD Ranked

Rankings of the best LLMs and agent pipelines on BIRD, Spider 2.0, CoSQL, and SParC text-to-SQL benchmarks, with execution accuracy scores and analysis.

← Previous