Developer tools

ARC-AGI-3 Launches - AI Agents Must Learn, Not Memorize

ARC Prize Foundation launched ARC-AGI-3 today with a fully open-source agent toolkit. The best AI in the preview phase scored 12.58% against a human baseline of 100%.

WordPress.com Opens Write Access to AI Agents via MCP

WordPress.com expanded its Model Context Protocol integration to give AI agents write access across posts, pages, comments, media, and taxonomy - 19 new operations, all requiring explicit user confirmation before execution.

Best LLM Eval Tools in 2026: 6 Options Tested

A data-driven comparison of DeepEval, Braintrust, Langfuse, LangSmith, Inspect AI, and RAGAS - the top LLM evaluation frameworks for teams building AI in production.

Cursor Ships Composer 2 - Its First In-House Coding Model

Cursor launches Composer 2, its first in-house coding model trained via RL on long-horizon tasks, scoring 73.7 on SWE-bench Multilingual at $0.50/M input tokens.

AI Browser Automation in 2026: Top 6 Tools Compared

A hands-on comparison of the top AI browser automation tools in 2026, covering Browser Use, Stagehand, Playwright MCP, Skyvern, Browserbase, and Firecrawl - with pricing, benchmarks, and pick-by-use-case.

Linux Foundation Raises $12.5M Against AI Bug Slop

Seven AI and cloud companies pool $12.5M through OpenSSF and Alpha-Omega to build tools that help open-source maintainers cope with a flood of AI-generated vulnerability reports they can't triage.

← Previous