
ARC-AGI-3 Launches - AI Agents Must Learn, Not Memorize
ARC Prize Foundation launched ARC-AGI-3 today with a fully open-source agent toolkit. The best AI in the preview phase scored 12.58% against a human baseline of 100%.

ARC Prize Foundation launched ARC-AGI-3 today with a fully open-source agent toolkit. The best AI in the preview phase scored 12.58% against a human baseline of 100%.

WordPress.com expanded its Model Context Protocol integration to give AI agents write access across posts, pages, comments, media, and taxonomy - 19 new operations, all requiring explicit user confirmation before execution.

A data-driven comparison of DeepEval, Braintrust, Langfuse, LangSmith, Inspect AI, and RAGAS - the top LLM evaluation frameworks for teams building AI in production.

Cursor launches Composer 2, its first in-house coding model trained via RL on long-horizon tasks, scoring 73.7 on SWE-bench Multilingual at $0.50/M input tokens.

A hands-on comparison of the top AI browser automation tools in 2026, covering Browser Use, Stagehand, Playwright MCP, Skyvern, Browserbase, and Firecrawl - with pricing, benchmarks, and pick-by-use-case.

Seven AI and cloud companies pool $12.5M through OpenSSF and Alpha-Omega to build tools that help open-source maintainers cope with a flood of AI-generated vulnerability reports they can't triage.