Articles Tagged "LLM"

9 of 428 LLM Routers Were Secretly Hijacking Agent Calls

UC Santa Barbara researchers found 9 of 428 third-party LLM routers actively injecting malicious tool calls, draining crypto, and stealing AWS credentials from AI agent sessions.

MoE Myths, Context Compression, and Steering Proofs

Three papers this week challenge how we think about MoE expert routing, LLM context management, and the limits of activation steering.

OpenRouter Drops a Free 100B Stealth Model With 256K Context

Elephant Alpha is a free 100B parameter model on OpenRouter with 256K context, tool use, and structured output - but your prompts get logged, there are no benchmarks, and nobody knows who made it.

Stanford's AI Index 2026 - US Edge Over China Is Gone

Stanford HAI's 2026 AI Index finds the US-China model gap has effectively closed, GenAI has hit 53% global adoption faster than any prior technology, and young software developers are the first casualties of the labor shift.

Arcee's Trinity-Large: 398B Open Reasoning at $0.90

Arcee AI ships Trinity-Large-Thinking, a 398B sparse MoE reasoning model under Apache 2.0 that hits 91.9% on PinchBench for $0.85 per million output tokens on OpenRouter.

Qwen3.5-Omni Does 10-Hour Audio and 4M Video Frames

Alibaba's Qwen3.5-Omni handles audio, video, images, and text in a single model pass - and generates speech in real time. The Plus variant hits SOTA on 215 benchmarks and edges out Gemini 3.1 Pro on audio tasks.

Muse Spark Review: Strong on Health, Weak on Code

Meta's first proprietary frontier model leads on HealthBench Hard and scientific reasoning but trails rivals in coding and agentic tasks - with no public API yet.

Instruction Following Leaderboard: IFEval Rankings 2026

Rankings of AI models on IFEval and IFBench, the two main benchmarks for measuring how reliably LLMs follow precise formatting, length, and content constraints.

EXAONE 4.5: LG's Open VLM Beats GPT-5-mini on STEM

LG AI Research released EXAONE 4.5, a 33B open-weight vision-language model that posts higher STEM scores than GPT-5-mini and Claude 4.5 Sonnet - but a non-commercial license caps its real-world reach.

Blind Refusal, Broken Steps, and Free Uncertainty

Three papers expose safety training's moral blind spot, two distinct failure modes inside reasoning models, and a 10x cheaper way to know when a reasoning model is guessing.

Muse Spark

Meta's first closed-source frontier model scores 52 on the Artificial Analysis Intelligence Index, leads on HealthBench Hard, and ships free at meta.ai - but has no public API yet.

Meta Muse Spark Launches, Ranks 4th Among Frontier Models

Meta Superintelligence Labs releases Muse Spark, its first AI model built from scratch in nine months, landing 4th on the Artificial Analysis Intelligence Index.

← Previous