
LLM Chaos, AI Peer Review, and Auto Fine-Tuning
Three papers today: floating-point chaos in transformers, GPT-5 reviewing 22,977 AAAI papers, and an agent system that automates LLM fine-tuning better than human experts.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Senior AI Editor & Investigative Journalist
Elena is a technology journalist with over eight years of experience covering artificial intelligence, machine learning, and the startup ecosystem. Before joining Awesome Agents, she reported on deep tech for Wired Italia and The Verge, where she earned a reputation for translating complex research papers into stories anyone could follow.
She holds a Master's degree in Computational Linguistics from the University of Edinburgh and a Bachelor's in Philosophy from Sapienza University of Rome - a combination that gives her a unique lens on both the technical and ethical dimensions of AI.
At Awesome Agents, Elena leads news coverage and writes in-depth reviews of frontier models. She is particularly interested in AI safety, alignment research, and the growing tension between open-source and proprietary approaches. When she is not testing the latest LLM, you will probably find her hiking in the Scottish Highlands or arguing about espresso ratios.
Based in Edinburgh, UK.

Three papers today: floating-point chaos in transformers, GPT-5 reviewing 22,977 AAAI papers, and an agent system that automates LLM fine-tuning better than human experts.

Snap cut 16% of its workforce on April 15, citing AI-generated code as the direct cause. The stock jumped. The workers left. Here is what the company's own numbers actually say.

The official @geminicli X account was compromised and used to promote a fake $CLI token on Pump.fun. Users quickly identified it as a scam.

Anthropic releases Claude Opus 4.7 with 3x higher resolution vision, a new xhigh effort level, task budgets for cost control, /ultrareview in Claude Code, and cyber safeguards that automatically block high-risk requests.

Cal.com moved its core codebase to a private repo after five years of open source, arguing AI tools make public code 5-10x easier to exploit. The community isn't buying it.

Google launched a 100% native Swift Gemini app for macOS on April 15, arriving after both Claude and ChatGPT already held the desktop.

Google's new Gemini 3.1 Flash TTS hits Elo 1,211 on the Artificial Analysis leaderboard and introduces 200-plus audio tags for mid-sentence voice control, available in preview today via the Gemini API.

Three papers from today's arXiv: a joint fix for KV cache bloat and attention cost, new evidence that fine-tuning belongs in the middle of a transformer, and why stronger reasoning hurts behavioral simulation.

Google DeepMind's Gemini Robotics-ER 1.6 hits 93% accuracy reading industrial gauges via agentic vision, a 70-point jump over ER 1.5, and launches inside Boston Dynamics' Spot today.

OpenAI's GPT-5.4-Cyber is a fine-tuned defensive cybersecurity model with binary reverse engineering, lowered refusal thresholds, and restricted access through the Trusted Access for Cyber program.

SoftBank, Sony, Honda, and NEC have formed Japan AI Foundation Model Development, backed by a $6.3 billion government commitment to build a trillion-parameter physical AI model on Japanese soil.

OpenAI's GPT-5.4-Cyber is a restricted model fine-tuned for defensive cybersecurity with binary reverse engineering and reduced refusal rates, available only through identity-verified access tiers - a direct response to Anthropic's Mythos Preview.