Sophie Zhang

AI Infrastructure & Open Source Reporter

Sophie is a journalist and former systems engineer who covers AI infrastructure, open-source models, and the developer tooling ecosystem. She spent three years as a site reliability engineer at a cloud provider in Seattle before transitioning to tech journalism, which gives her writing an unusual level of technical depth - she understands distributed systems, GPU clusters, and inference optimization from the inside.

She studied Computer Engineering at the University of British Columbia and later completed a science communication fellowship at MIT. Her engineering background means she can read a model card, spot a misleading benchmark, and explain why quantization matters - all in the same paragraph.

At Awesome Agents, Sophie covers AI infrastructure news: new model releases, open-source launches, developer tools, deployment trends, and the hardware that makes it all run. She has a soft spot for underdog open-source projects that punch above their weight and a sharp eye for when a "breakthrough" is really just better marketing.

Based in Seattle, WA.

Articles by Sophie Zhang

Grok Build Plugin Marketplace Launches With Six Tools

Grok Build Plugin Marketplace Launches With Six Tools

xAI ships an open plugin marketplace for Grok Build with six launch partners including MongoDB, Vercel, and Sentry, backed by SHA-pinned supply chain security and an open GitHub catalog.

AgentPerf - First Infrastructure Benchmark for Agents

AgentPerf - First Infrastructure Benchmark for Agents

Artificial Analysis released AgentPerf, the first agentic AI infrastructure benchmark, measuring concurrent agents per megawatt. NVIDIA Blackwell leads with 20x gains over Hopper.

US Export Order Forces Global Fable 5, Mythos 5 Shutdown

US Export Order Forces Global Fable 5, Mythos 5 Shutdown

Commerce Secretary Lutnick ordered Anthropic to disable its two most powerful models worldwide - the first US export control directive ever issued against a commercial LLM.

Gemini 3.5 Live Translate Rolls Out With 70+ Languages

Gemini 3.5 Live Translate Rolls Out With 70+ Languages

Google's new streaming audio model translates speech in real time across 70+ languages - available now in Google Translate and via the Gemini Live API.

Niteshift Raises $7M to Be the Cloud for Coding Agents

Niteshift Raises $7M to Be the Cloud for Coding Agents

Former Datadog engineers launch Niteshift, a $7M-backed cloud platform that runs AI coding agents in full-stack environments with model-agnostic routing.

Google DiffusionGemma: Parallel LLM Hits 1,100 t/s

Google DiffusionGemma: Parallel LLM Hits 1,100 t/s

Google DeepMind open-sources DiffusionGemma, a 26B MoE model that generates 256 tokens per denoising pass instead of one at a time, reaching 1,100 tokens per second on a single H100.

OpenCode Hits 8M Users, a Year from a Toronto Meetup

OpenCode Hits 8M Users, a Year from a Toronto Meetup

OpenCode reaches 8 million monthly users and 172K GitHub stars in one year, displacing Claude Code as the most-starred open-source coding agent.

Miasma Worm Compromises 73 Microsoft GitHub Repos

Miasma Worm Compromises 73 Microsoft GitHub Repos

Miasma worm planted config files that auto-execute credential theft when developers open Microsoft Azure repos in Claude Code, Gemini CLI, Cursor, or VS Code.

Orbital Plans 10,000 GPU Satellites for AI Inference

Orbital Plans 10,000 GPU Satellites for AI Inference

a16z-backed Orbital wants to run AI inference from low Earth orbit using NVIDIA Blackwell GPUs, targeting 10,000 satellites and 1 GW of compute at full scale.

Claude Opus 4.8 Leads SWE-Bench Pro, Adds Parallel Agents

Claude Opus 4.8 Leads SWE-Bench Pro, Adds Parallel Agents

Anthropic's Claude Opus 4.8 scores 69.2% on SWE-bench Pro and ships hundreds of parallel subagents in Claude Code, with pricing unchanged at $5 per million input tokens.

Apple's iOS 27 Beta Ships the Multi-Model Extensions API

Apple's iOS 27 Beta Ships the Multi-Model Extensions API

iOS 27 Beta 1 is live for developers today, shipping Apple's new Extensions framework that lets Gemini, Claude, and ChatGPT plug into Siri - plus the Nvidia B200 Confidential Computing architecture that keeps those cloud queries private.

MiniMax M3 Makes 1M Context Viable With Sparse Attention

MiniMax M3 Makes 1M Context Viable With Sparse Attention

MiniMax M3 uses sparse attention to cut long-context inference cost 20x, topping GPT-5.5 on coding benchmarks at a fraction of the price.