Competitive programming

Coding Grandmasters, Formal Proofs, and Agent Hazards

Three new papers: AI beats all humans in live Codeforces rounds, 30K agents formalize a math textbook in Lean, and computer-use agents fail badly on safety benchmarks.

Competitive programming

Coding Grandmasters, Formal Proofs, and Agent Hazards

Google Analytics