
NVIDIA Ships Nemotron 3 Ultra - 550B Open-Weight MoE
NVIDIA's 550B Nemotron 3 Ultra, released June 4, tops the US open-weight leaderboard with a hybrid Mamba-Transformer MoE architecture and 300-plus tokens per second throughput.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

NVIDIA's 550B Nemotron 3 Ultra, released June 4, tops the US open-weight leaderboard with a hybrid Mamba-Transformer MoE architecture and 300-plus tokens per second throughput.

Three new arXiv papers expose how developers miss AI sabotage 94% of the time, why LLMs converge structurally in code evolution, and how ZK proofs could verify frontier AI training.

NVIDIA's Agent Toolkit lands 110+ verified skills on GitHub covering robotics, autonomous vehicles, vision AI, and industrial systems - turning complex physical AI pipelines into single agent calls.

Three new papers show that AI agents fail not by doing the wrong thing, but by doing things when they should have stopped.

A benchmark-driven comparison of Claude Code, Kiro, Devin, OpenAI Codex, Windsurf, and OpenHands - the six coding agents worth using in 2026.

Microsoft's open-source ASSERT framework turns natural language behavior specs into executable, auditable test suites for AI agents and LLM applications.

Three new papers expose how reasoning traces can be extracted from supposedly hidden model internals, where chain-of-thought hits an architectural ceiling, and how RL teaches models to know when to quit.

The Agent Control Standard defines open middleware hooks that let teams block, allow, or modify AI agent actions before they reach production systems.

Nvidia's RTX Spark packs 20 Arm CPU cores and a Blackwell 2.0 GPU with 6,144 CUDA cores into a 45-80W Windows laptop chip, targeting Apple Silicon head-on.

Three papers: smarter CoT trimming cuts reasoning length by 50%, a plug-in context manager rescues frozen agents on long tasks, and a 960K-item clinical benchmark exposes LLM gaps in hospitals.

Google's Antigravity 2.0 rewrites the platform from a browser IDE into a five-surface agent suite. The architecture is ambitious, the launch was a mess.

CVE-2026-48710 in Starlette lets a single malformed HTTP header bypass authentication on vLLM, LiteLLM, FastAPI, and every MCP server in production.