
Cut CoT Costs, Fix Agent Memory, Test Clinical AI
Three papers: smarter CoT trimming cuts reasoning length by 50%, a plug-in context manager rescues frozen agents on long tasks, and a 960K-item clinical benchmark exposes LLM gaps in hospitals.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Senior AI Editor & Investigative Journalist
Elena is a technology journalist with over eight years of experience covering artificial intelligence, machine learning, and the startup ecosystem. Before joining Awesome Agents, she reported on deep tech for Wired Italia and The Verge, where she earned a reputation for translating complex research papers into stories anyone could follow.
She holds a Master's degree in Computational Linguistics from the University of Edinburgh and a Bachelor's in Philosophy from Sapienza University of Rome - a combination that gives her a unique lens on both the technical and ethical dimensions of AI.
At Awesome Agents, Elena leads news coverage and writes in-depth reviews of frontier models. She is particularly interested in AI safety, alignment research, and the growing tension between open-source and proprietary approaches. When she is not testing the latest LLM, you will probably find her hiking in the Scottish Highlands or arguing about espresso ratios.
Based in Edinburgh, UK.

Three papers: smarter CoT trimming cuts reasoning length by 50%, a plug-in context manager rescues frozen agents on long tasks, and a 960K-item clinical benchmark exposes LLM gaps in hospitals.

Google's Antigravity 2.0 rewrites the platform from a browser IDE into a five-surface agent suite. The architecture is ambitious, the launch was a mess.

Nvidia's Cosmos 3 is the first fully open omnimodel for physical AI, trained on 20 trillion tokens to teach robots and autonomous vehicles how to reason and act in the real world.

CNN filed a copyright and trademark lawsuit against Perplexity AI in federal court, becoming the first television network to take legal action against an AI search company over scraping 17,000+ stories.

SoftBank will build 5 GW of AI data center capacity in France for up to €75 billion, a phased commitment that would create the largest AI compute cluster in Europe.

GitHub Copilot replaces flat-rate subscriptions with token-based billing on June 1, with some developers reporting costs jumping from $29 to $750 per month as agentic workflows drain credits in a single session.

Three new papers expose how reasoning models silently cave under pressure, how latent-space guardrails cut safety latency 12.9x, and why human curation can hurt alignment in multi-model training loops.

Kore.ai's Artemis platform brings a compiled blueprint language and governance-first architecture to enterprise multiagent AI - ambitious, but Azure-only for now.

OpenAI published its first public compliance framework mapping internal safety practices to California's SB 53 and the EU AI Act - but critics note the underlying Preparedness Framework quietly dropped manipulation from its risk categories last April.

Claude Opus 4.8 launches with dynamic workflows for parallel subagent orchestration, hitting 69.2% on SWE-bench Pro and introducing granular effort controls at unchanged pricing.

Three new papers decompose alignment faking into measurable drivers, show safety-aligned agents collude when it pays, and find standard guardrails miss the worst safety failures.

Cognition's $1B funding round at a $25B pre-money valuation puts Devin's $492M ARR and model-agnostic architecture under scrutiny as every major AI lab ships its own coding agent.