Articles Tagged "AI Agents"

Distillation Leaks, Weak Agents, and Research Sabotage

New papers show distillation silently transfers unsafe behaviors, weak agents bottleneck multi-agent pipelines, and frontier AI can't reliably audit sabotaged ML research.

Factory Raises $150M to Scale Enterprise AI Droids

Factory closed a $150M Series C at a $1.5B valuation to expand its Droids - autonomous agents that handle the full software development lifecycle, not just code generation.

OpenClaw Passes The Fake-Star Audit - Mostly

We ran our fake-star methodology against OpenClaw and 10 ecosystem variants, sampling 361,000-star profiles and fork ratios. The main repo looks clean. Most clones look clean. One repo with 6,532 claimed stars has vanished.

Agent Platform Pricing Compared 2026

True cost breakdown of commercial agent frameworks and platforms - LangGraph, CrewAI, AutoGen, E2B, Modal, Fly.io, and more at 1k, 100k, and 1M runs, including LLM passthrough costs.

Best AI Deep Research Tools 2026: Ranked for Accuracy

Compare the best AI deep research tools of 2026 - OpenAI, Claude, Perplexity, Gemini, Grok, Exa, Elicit, and more. Pricing, accuracy, and which to pick.

Search API Pricing Compared 2026

Per-query pricing for search APIs used in AI agents and RAG pipelines - Brave, Tavily, Exa, SerpAPI, Serper, Perplexity Sonar, You.com, Jina Reader, Firecrawl, and more compared at 10k, 100k, and 1M queries.

SWE-Bench Coding Agent Leaderboard 2026: Claude vs GPT

Rankings of the best LLM-powered software engineering agents on SWE-Bench Verified, with pass rates, pricing, scaffold notes, and methodology - updated April 2026.

World ID 4.0 Brings Human Verification to Tinder and Zoom

Sam Altman's World project launched World ID 4.0 at a San Francisco event on April 17, signing Tinder, Zoom, DocuSign, and Okta as partners while introducing Agent Kit to authorize AI agents.

Web Agent Benchmarks Leaderboard: Apr 2026

Rankings across WebArena, WebVoyager, BrowseComp, Mind2Web, WorkArena, and WebChoreArena - every verified score for browser-driving AI agents as of April 2026.

Best AI Customer Support Tools 2026: 12 Platforms

A data-driven comparison of 12 AI customer support platforms covering pricing models, resolution rates, channel coverage, and helpdesk integrations for 2026.

Physical Intelligence Launches π0.7 for Untrained Tasks

Physical Intelligence's π0.7 robot model can generalize to tasks it was never explicitly trained on, matching fine-tuned specialist models through compositional skill recombination.

Function Calling Benchmarks Leaderboard 2026

Rankings of top LLMs on function calling and tool use benchmarks including BFCL v3, tau-bench, ToolBench, and FinTrace as of April 2026.

← Previous