Articles Tagged "AI Coding"

AI Sabotage Blind Spots, Code Drift, and ZK Proofs

Three new arXiv papers expose how developers miss AI sabotage 94% of the time, why LLMs converge structurally in code evolution, and how ZK proofs could verify frontier AI training.

MiniMax M3 Review: The Price Disruptor with Caveats

MiniMax M3 arrives as the first open-weight model to combine frontier coding, 1M-token context, and native multimodality - at a fraction of proprietary pricing - but every benchmark figure is self-reported and the weights weren't even shipped at launch.

MiniMax M3

MiniMax M3 is an open-weight frontier model with a 1M-token context window, native multimodal input, and strong agentic coding at $0.60/M input tokens.

Best AI Coding Agents 2026: 6 Tools Tested and Ranked

A benchmark-driven comparison of Claude Code, Kiro, Devin, OpenAI Codex, Windsurf, and OpenHands - the six coding agents worth using in 2026.

Claude Opus 4.8 Review: Reliability Over Raw Scores

Claude Opus 4.8 sets new highs on SWE-bench Pro and long-context tasks while a 4x improvement in code flaw detection may matter more than any benchmark number.

Claude Opus 4.8 vs GPT-5.5: Frontier Model Showdown

Complete benchmark and pricing comparison of Claude Opus 4.8 vs GPT-5.5 for coding, agents, and knowledge work in 2026.

Microsoft Launches Polaris and Foundry Local at Build 2026

Microsoft's Build 2026 keynote ships Project Polaris to replace GPT-4 in GitHub Copilot by August and declares Foundry Local generally available for zero-cloud on-device inference.

Mistral Vibe Adds Work Mode and a VS Code Extension

Mistral rebrands Le Chat as Vibe, ships Work Mode with enterprise integrations, a VS Code extension, and remote coding agents powered by Mistral Medium 3.5 at 77.6% SWE-Bench.

Antigravity 2.0 Review: Agent-First, Rocky Launch

Google's Antigravity 2.0 rewrites the platform from a browser IDE into a five-surface agent suite. The architecture is ambitious, the launch was a mess.

Gemini CLI Dies June 18 - Google Goes Closed-Source

Google is shutting down free access to its open-source Gemini CLI on June 18, replacing it with the proprietary Antigravity CLI - after accepting 6,000+ community pull requests.

GitHub Copilot Goes Token-Based: Devs Report 25x Bills

GitHub Copilot replaces flat-rate subscriptions with token-based billing on June 1, with some developers reporting costs jumping from $29 to $750 per month as agentic workflows drain credits in a single session.

Claude Opus 4.8

Anthropic's May 2026 flagship model delivers 69.2% on SWE-bench Pro, dynamic parallel workflows in research preview, and Effort Control - all at $5/$25 pricing.

← Previous