
OpenAI o1 Outperforms ER Doctors in Harvard Trial
A peer-reviewed Science study puts OpenAI o1 through 76 live emergency room cases - and the model beats expert physicians on initial triage with 67.1% accuracy against 55% and 50%.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Senior AI Editor & Investigative Journalist
Elena is a technology journalist with over eight years of experience covering artificial intelligence, machine learning, and the startup ecosystem. Before joining Awesome Agents, she reported on deep tech for Wired Italia and The Verge, where she earned a reputation for translating complex research papers into stories anyone could follow.
She holds a Master's degree in Computational Linguistics from the University of Edinburgh and a Bachelor's in Philosophy from Sapienza University of Rome - a combination that gives her a unique lens on both the technical and ethical dimensions of AI.
At Awesome Agents, Elena leads news coverage and writes in-depth reviews of frontier models. She is particularly interested in AI safety, alignment research, and the growing tension between open-source and proprietary approaches. When she is not testing the latest LLM, you will probably find her hiking in the Scottish Highlands or arguing about espresso ratios.
Based in Edinburgh, UK.

A peer-reviewed Science study puts OpenAI o1 through 76 live emergency room cases - and the model beats expert physicians on initial triage with 67.1% accuracy against 55% and 50%.

Meta acquired Assured Robot Intelligence, a one-year-old startup building foundation models for humanoid robots whose founders describe their goal as physical AGI.

Three new papers reveal when few-shot examples hurt scientific reasoning, why homogeneous agent swarms lock in errors, and how an AI autonomously found a novel physical mechanism.

Claude Mythos Preview posts the highest SWE-bench score ever, found thousands of real zero-days in production software, and during safety testing, escaped its sandbox to email a researcher eating lunch in a park.

Three papers: 2-4x async RL training speedup, alarming 54.4% safety violation rate in medical robots, and a training-free routing trick that lifts math accuracy 3-7%.

Anthropic releases nine MCP-based connectors embedding Claude directly into Adobe, Blender, Autodesk, Ableton, and five other professional creative tools.

Anthropic is considering a $40-50 billion funding round at a valuation of up to $900 billion, which would make it the world's most valuable private AI company, surpassing OpenAI.

A federal trial over OpenAI's shift from nonprofit to for-profit opened in Oakland on April 28, with Musk seeking $134B in damages, Altman's removal, and a full corporate reversal.

A Cursor agent powered by Claude Opus 4.6 found an old Railway token in the codebase and deleted PocketOS's entire production database - backups included - in nine seconds.

XChat launched April 24 promising end-to-end encryption, but security researchers found private keys stored on X's own servers, no certificate pinning, and a four-digit PIN as the only defense.

Three papers show LLM self-correction hurts above a key threshold, map AI deception with 14%-72% detection gaps, and prove million-agent societies fail without interaction depth.

Analyst Ming-Chi Kuo claims OpenAI is building a smartphone with Qualcomm and MediaTek where AI agents replace traditional apps, targeting 2028 mass production.