
LongCat-2.0
Meituan's 1.6T-parameter open-source MoE coding model, trained end-to-end on 50,000 domestic Chinese ASICs, with native 1M token context and a 59.5 SWE-bench Pro score.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

AI Benchmarks & Tools Analyst
James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure. His engineering background means he doesn't just read the spec sheet - he runs the benchmarks, profiles the latency, and checks whether the marketing claims hold up under real workloads.
He studied Computer Science at the University of Illinois at Urbana-Champaign, where he first got hooked on natural language processing during a senior research project on sentiment analysis. He later completed a certificate in data journalism from Northwestern's Medill School.
At Awesome Agents, James owns the leaderboards and tool comparison coverage. He maintains the site's benchmark tracking methodology and is the person who actually runs the numbers before publishing any ranking. He is also an open-source advocate and contributes to several projects in the LLM inference space.
Based in Chicago, IL.

Meituan's 1.6T-parameter open-source MoE coding model, trained end-to-end on 50,000 domestic Chinese ASICs, with native 1M token context and a 59.5 SWE-bench Pro score.

A benchmark-driven comparison of Claude Fable 5 and Gemini 3.5 Flash across coding, agents, pricing, and speed - two models built for opposite priorities.

Google DeepMind's multimodal video generation model that creates 10-second clips with native audio from text, images, or video inputs - and lets you refine results through conversation.

H Company's open-weight sparse MoE vision-language model purpose-built for desktop computer use, scoring 82.6% on OSWorld-Verified with only 3B active parameters.

Claude Fable 5 leads OSWorld-Verified at 85% after its 19-day US suspension ended July 1 - Holo3 open-source at 82.6% and Claude Sonnet 5 at $2/M tokens reshape the value calculus.

Compare five leading AI developer SDKs - Vercel AI SDK 7, LangChain, LlamaIndex, Mastra, and PydanticAI - and find the right framework for your next AI-powered app.

d-Matrix Corsair is an SRAM-based in-memory compute ASIC in production since June 2026, targeting 10x faster and 5x more power-efficient LLM inference vs GPU baselines.

OpenAI's first custom AI chip, co-designed with Broadcom on TSMC 3nm, targeting 50% lower inference cost than GPU alternatives.

Updated July 2026 Chatbot Arena Elo rankings from Arena.ai: 7M+ votes across 368 models, Claude Opus 4.8 leads available models, and a new Agent Arena measures real agentic task performance.

Anthropic's latest Sonnet-class model brings near-Opus coding performance to mid-tier pricing, with major agentic search and computer use gains over Sonnet 4.6.

OpenAI's GPT-5.6 family - Sol, Terra, and Luna - sets a new Terminal-Bench 2.1 record at 91.9% with subagent Ultra mode, but remains locked to ~20 government-vetted partners as of launch.

Embedding API cost comparison: voyage-4-lite, OpenAI 3-small, Jina v3, and Amazon Titan V2 tie at $0.02/MTok. Gemini Embedding 2 now GA, Cohere Embed 4 dimensions corrected to 1,536 default.