Articles Tagged "Formal Verification"

Safety Evals Break Under Attack, Agents Work 87% Faster

Three papers: strategic attack timing exposes gaps in AI control evaluations, Perplexity's agents slash task time by 87%, and Lean4 formal proofs make agent workflows more reliable.

Coding Grandmasters, Formal Proofs, and Agent Hazards

Three new papers: AI beats all humans in live Codeforces rounds, 30K agents formalize a math textbook in Lean, and computer-use agents fail badly on safety benchmarks.

Tao: Ideas Are Now Free - Math's Bottleneck Has Moved

Terence Tao argues AI has cut the cost of mathematical idea generation to near zero, but verification remains as hard as ever - and our existing academic infrastructure wasn't built for what comes next.

Leanstral Outperforms Claude Sonnet at Formal Code Proofs

Mistral's new open-source Lean 4 agent scores higher than Claude Sonnet on formal proofs at one-fifteenth the cost, raising the bar for trustworthy AI code generation.

Knuth Names Paper After Claude That Solved His Math Conjecture

Claude Opus 4.6 solved a directed graph decomposition conjecture Knuth had worked on for weeks in 31 guided explorations over roughly an hour. Knuth wrote the formal proof himself and titled the paper 'Claude's Cycles.'

Math Olympiad AI Leaderboard - March 2026 Rankings

Rankings of AI models on competition mathematics benchmarks including AIME 2025, IMO, MathArena, and FrontierMath, measuring the cutting edge of mathematical reasoning.