
9 of 428 LLM Routers Were Secretly Hijacking Agent Calls
UC Santa Barbara researchers found 9 of 428 third-party LLM routers actively injecting malicious tool calls, draining crypto, and stealing AWS credentials from AI agent sessions.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

UC Santa Barbara researchers found 9 of 428 third-party LLM routers actively injecting malicious tool calls, draining crypto, and stealing AWS credentials from AI agent sessions.

Three papers this week challenge how we think about MoE expert routing, LLM context management, and the limits of activation steering.

Elephant Alpha is a free 100B parameter model on OpenRouter with 256K context, tool use, and structured output - but your prompts get logged, there are no benchmarks, and nobody knows who made it.

Stanford HAI's 2026 AI Index finds the US-China model gap has effectively closed, GenAI has hit 53% global adoption faster than any prior technology, and young software developers are the first casualties of the labor shift.

Arcee AI ships Trinity-Large-Thinking, a 398B sparse MoE reasoning model under Apache 2.0 that hits 91.9% on PinchBench for $0.85 per million output tokens on OpenRouter.

Alibaba's Qwen3.5-Omni handles audio, video, images, and text in a single model pass - and generates speech in real time. The Plus variant hits SOTA on 215 benchmarks and edges out Gemini 3.1 Pro on audio tasks.

Meta's first proprietary frontier model leads on HealthBench Hard and scientific reasoning but trails rivals in coding and agentic tasks - with no public API yet.

Rankings of AI models on IFEval and IFBench, the two main benchmarks for measuring how reliably LLMs follow precise formatting, length, and content constraints.

LG AI Research released EXAONE 4.5, a 33B open-weight vision-language model that posts higher STEM scores than GPT-5-mini and Claude 4.5 Sonnet - but a non-commercial license caps its real-world reach.

Three papers expose safety training's moral blind spot, two distinct failure modes inside reasoning models, and a 10x cheaper way to know when a reasoning model is guessing.

Meta's first closed-source frontier model scores 52 on the Artificial Analysis Intelligence Index, leads on HealthBench Hard, and ships free at meta.ai - but has no public API yet.

Meta Superintelligence Labs releases Muse Spark, its first AI model built from scratch in nine months, landing 4th on the Artificial Analysis Intelligence Index.