
llm-d Joins CNCF - Kubernetes Gets a Native LLM Inference Stack
IBM Research, Red Hat, and Google Cloud donated llm-d to the CNCF at KubeCon EU, giving Kubernetes a production-grade distributed LLM inference framework built on vLLM.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

AI Infrastructure & Open Source Reporter
Sophie is a journalist and former systems engineer who covers AI infrastructure, open-source models, and the developer tooling ecosystem. She spent three years as a site reliability engineer at a cloud provider in Seattle before transitioning to tech journalism, which gives her writing an unusual level of technical depth - she understands distributed systems, GPU clusters, and inference optimization from the inside.
She studied Computer Engineering at the University of British Columbia and later completed a science communication fellowship at MIT. Her engineering background means she can read a model card, spot a misleading benchmark, and explain why quantization matters - all in the same paragraph.
At Awesome Agents, Sophie covers AI infrastructure news: new model releases, open-source launches, developer tools, deployment trends, and the hardware that makes it all run. She has a soft spot for underdog open-source projects that punch above their weight and a sharp eye for when a "breakthrough" is really just better marketing.
Based in Seattle, WA.

IBM Research, Red Hat, and Google Cloud donated llm-d to the CNCF at KubeCon EU, giving Kubernetes a production-grade distributed LLM inference framework built on vLLM.

GitHub Copilot inserts promotional tips for itself and Raycast into PR descriptions, with over 11,000 affected pull requests found across GitHub and GitLab.

HuggingFace's Transformers.js v4 rewrites its WebGPU runtime in C++, supports 200+ architectures, and delivers up to 4x faster inference in browsers and server-side JS runtimes.

Mistral AI secures $830M in debt financing from seven banks to build a 13,800-GPU Nvidia GB300 cluster near Paris, targeting 200MW of European compute by 2027.

NVIDIA's DSX Flex library and Emerald AI's Conductor platform let AI factories ramp GPU power up or down in seconds, unlocking faster grid connections and up to 100GW of new U.S. capacity.

Meta releases SAM 3.1 with Object Multiplex, processing all tracked objects in one shared pass for 7x faster inference at 128 objects and improvements on 6 of 7 VOS benchmarks.

Cohere releases its first audio model - a 2B-parameter open-source ASR system beating Whisper Large v3 by 27% on the HuggingFace Open ASR Leaderboard.

Microsoft signs a deal with Crusoe for a new 900 MW AI factory campus in Abilene, Texas, adjacent to the Stargate site Oracle and OpenAI walked away from three weeks ago.

OpenAI ships a plugin marketplace for Codex CLI v0.117.0, bundling skills, app integrations, and MCP server configs behind an enterprise governance layer.

Mistral releases Voxtral, a pair of open-weights models covering speech recognition and text-to-speech that undercut OpenAI and ElevenLabs on price.

ARC Prize Foundation launched ARC-AGI-3 today with a fully open-source agent toolkit. The best AI in the preview phase scored 12.58% against a human baseline of 100%.

Ai2's MolmoWeb is a fully open-source web agent that navigates browsers by screenshot alone, beating GPT-4o-based agents at the 8B scale with weights, training data, and code all released under Apache 2.0.