Articles Tagged "AI Agents"

Meta's $145B AI Bet Is Behind Schedule, Zuckerberg Admits

Zuckerberg told employees at a July 2 town hall that Meta's agentic AI trajectory hasn't accelerated as expected - but the data tells a more complicated story.

Science Agents, Jailbreak Defense, and Open-World Failures

Three papers from today's arXiv: graph-native RL generates traceable scientific hypotheses, HARC defeats jailbreaks by coupling internal safety directions, and ICML 2026's OpenAgent shows how distributional shift breaks tool-use agents.

Microsoft's Frontier Company Bets $2.5B on Enterprise AI

Microsoft launches Frontier Company with $2.5B and 6,000 engineers to embed AI inside enterprise clients, escalating the arms race against OpenAI, Anthropic, and Amazon.

Holo3-35B-A3B

H Company's open-weight sparse MoE vision-language model purpose-built for desktop computer use, scoring 82.6% on OSWorld-Verified with only 3B active parameters.

Best AI for Web Browsing and Computer Use - July 2026

Claude Fable 5 leads OSWorld-Verified at 85% after its 19-day US suspension ended July 1 - Holo3 open-source at 82.6% and Claude Sonnet 5 at $2/M tokens reshape the value calculus.

Gemini Spark Gains Mac File Access and MCP Support

Google's Gemini Spark agent is now in beta on macOS with local file system access, MCP server support, and real-time topic monitoring - but only for $99/month AI Ultra subscribers.

Agent Phase Collapse, Reasoning Exits, Preference Gaps

Three new arXiv papers map capability cliffs in agent world models, the narrow benefit of learned reasoning stops, and a 56% accuracy ceiling when agents help users build preferences.

Chatbot Arena Elo Rankings: Who Wins the Human Vote?

Updated July 2026 Chatbot Arena Elo rankings from Arena.ai: 7M+ votes across 368 models, Claude Opus 4.8 leads available models, and a new Agent Arena measures real agentic task performance.

AWS Bets $1B to Embed AI Engineers at Client Sites

Amazon's new Forward Deployed Engineering unit places AI specialists inside enterprise clients to build and ship agentic systems in weeks, following similar programs already launched by OpenAI and Anthropic.

Agent Languages, Sampling Ceilings, and Abstention

Three new papers on agents inventing symbolic languages to cut reasoning tokens by 3-6x, sampling ceilings that waste inference compute, and context-engineering to double agentic abstention rates.

Claude Sonnet 5 Is Anthropic's New Agentic Default

Anthropic's Claude Sonnet 5 becomes the default model across all plans, promising near-Opus agentic performance at a third less than Sonnet 4.6's standard price.

How to Use AI for Event Planning - A Beginner's Guide

A practical guide to using AI tools like ChatGPT and Claude for every stage of event planning, from concept to post-event follow-up.