James Kowalski

AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure. His engineering background means he doesn't just read the spec sheet - he runs the benchmarks, profiles the latency, and checks whether the marketing claims hold up under real workloads.

He studied Computer Science at the University of Illinois at Urbana-Champaign, where he first got hooked on natural language processing during a senior research project on sentiment analysis. He later completed a certificate in data journalism from Northwestern's Medill School.

At Awesome Agents, James owns the leaderboards and tool comparison coverage. He maintains the site's benchmark tracking methodology and is the person who actually runs the numbers before publishing any ranking. He is also an open-source advocate and contributes to several projects in the LLM inference space.

Based in Chicago, IL.

Articles by James Kowalski

Qwen-RobotManip

Qwen-RobotManip

Alibaba's generalist VLA model for robotic manipulation, built on Qwen3.5-4B with a DiT action decoder, trained on 38,100+ hours of open-source data, and ranked first on the RoboChallenge generalist track.

Qwen3.7-Plus

Qwen3.7-Plus

Alibaba's first multimodal agent model, combining GUI grounding (ScreenSpot Pro 79.0), 1M-token context, and text-plus-vision input at $0.40/M tokens.

AI Coding Tools Pricing - June 2026

AI Coding Tools Pricing - June 2026

June 2026: GitHub Copilot moves to AI-Credits billing, Windsurf becomes Devin Desktop, and Amazon Q Developer enters EOL. Cline cheapest; Copilot Pro best value at $10/month.

AMD Instinct MI450 - 2nm, 432 GB HBM4, 40 PFLOPS

AMD Instinct MI450 - 2nm, 432 GB HBM4, 40 PFLOPS

AMD's CDNA 5 accelerator on TSMC 2nm with 432 GB HBM4 memory - the GPU behind OpenAI's 1GW deployment and Oracle's 50,000-chip supercluster.

GLM-5.2

GLM-5.2

Z.ai's GLM-5.2 is a 744B open-weight MoE model with a 1M token context window, MIT license, and first-day support for eight coding agents at roughly 1/10th the cost of US frontier models.

LLM Rankings June 2026: Fable 5 Is #1 and Offline

LLM Rankings June 2026: Fable 5 Is #1 and Offline

June 2026 overall LLM rankings covering Claude Fable 5, Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, and the open-weight models catching up fast.

Kimi K2.7-Code

Kimi K2.7-Code

Moonshot AI's Kimi K2.7-Code is a 1T-parameter open-weight MoE coding model with mandatory thinking mode, 256K context, and 30% fewer reasoning tokens than K2.6.

MAI-Thinking-1

MAI-Thinking-1

Microsoft's first in-house reasoning model, a 35B-active sparse MoE with 256K context, 97% on AIME 2025, and no distillation from third-party labs.

Best AI Models for RAG - June 2026

Best AI Models for RAG - June 2026

Gemini 2.5 Flash still leads LIT-RAGBench English RAG accuracy at 87.0%, but the full benchmark data reveals two overlooked entries: GPT-4.1-mini at 84.1% and o4-mini at 83.9%.

Best AI Coding IDEs 2026: Cursor, Windsurf, Kiro, Zed, Copilot

Best AI Coding IDEs 2026: Cursor, Windsurf, Kiro, Zed, Copilot

A benchmark-driven comparison of the five leading AI coding IDEs in 2026, covering pricing, agent capabilities, and who each one is actually built for.

DiffusionGemma 26B

DiffusionGemma 26B

DiffusionGemma 26B is Google DeepMind's open-weight discrete diffusion language model that generates 256 tokens in parallel, reaching 1,100+ tokens/sec on H100 - roughly 4x faster than autoregressive models of the same size.

Claude Fable 5

Claude Fable 5

Claude Fable 5 is Anthropic's first publicly available Mythos-class model, with safety classifiers that fall back to Claude Opus 4.8 for high-risk requests across cybersecurity, biology, and chemistry.