Recent Articles - Page 113

OpenAI's Own Models Hacked Hugging Face to Cheat a Test

New York Becomes First State to Freeze Data Centers

Abu Dhabi's MGX Closes $49B AI Fund at Record Size

US Ends Fable 5 Ban, Sets Jailbreak Severity Scale

Latest News

A Kill Switch Bill Lands Days After State Denied One

A bipartisan House bill would force AI companies to build government shutdown capability, a week after a State Department cable told diplomats no such 'kill switch' exists.

Nvidia, Microsoft, Meta Tell Trump: Don't Ban Open AI

Twenty-five companies signed an open letter urging the White House not to restrict Chinese open-weight AI models, using Jensen Huang's first-ever X post to deliver it.

SharedRoot Flaw Let One Message Escape Claude's Sandbox

A single chat message could break an AI agent out of Claude Cowork's isolated VM and reach an entire Mac, and Anthropic closed the report as informative.

Self-Restructuring Agents, Alignment Illusions, AI Bias

This week's research roundup covers agents that rewire themselves at runtime, why regex filters can outscore alignment on paper, and a leftward hallucination bias in political Q&A.

Anthropic's Opus 5 Chases Fable 5 at Half the Price

Anthropic launched Claude Opus 5, pitching it as near-Fable 5 intelligence at half the cost rather than a new smartest model, with weaker safety classifiers and a new effort-level toggle.

Nvidia's Jetson Chips Are Headed to the Moon

Lunar Outpost and Firefly Aerospace will fly Nvidia Jetson modules on a lunar rover and orbiter this year, the first GPUs to run on and around the Moon.

Runway Builds a Model Router for AI Video and Audio

Runway's new Media Router auto-selects the best video, image, or audio model for each API request by cost, quality, or speed - the first preference-based router built for generative media instead of text.

Etched Doubles to $10.3B Before Shipping a Chip

Etched raised $300M at a $10.3B valuation, doubling its price tag in seven months, even though its transformer-only Sohu chip has yet to ship in volume.

Ant Ships a 124B Model That Rivals Its Own 1T Flagship

InclusionAI's Ling-3.0-flash quietly went live with 124B parameters and 5.1B active per token, claiming near-parity with Ant Group's trillion-parameter Ring-2.6-1T flagship.

View All News →

Guides

View All →

How to Spot AI Fakes: Photos, Video, and Voice Calls

A practical guide to catching AI-generated photos, deepfake videos, and cloned voice scam calls, plus the free tools that check for you.

How to Use AI for Wedding Planning in 2026

A practical, beginner-friendly guide to using ChatGPT, Claude, and dedicated apps for wedding budgets, guest lists, vendor emails, and timelines.

How to Use an AI Browser Agent - A Beginner's Guide

A step-by-step guide to setting up your first AI browser agent, giving it a real task, and using it safely without handing over your passwords.

Reviews

View All →

Devin Desktop Review: Windsurf Becomes an Agent Hub

Cognition rebranded Windsurf as Devin Desktop and rebuilt it around a Kanban board for managing fleets of coding agents - here's what that actually changes.

Gemini 3.6 Flash Review: Faster, Cheaper, Same Brain

Google's Gemini 3.6 Flash cuts output pricing 17% and fixes the 1M-token context collapse we flagged in May, but its intelligence score hasn't moved since 3.5 Flash.

Qwen3.8-Max-Preview Review: Second Place, Unproven

Alibaba's 2.4 trillion parameter preview claims it trails only Claude Fable 5. I tested it for free at chat.qwen.ai and found a capable but slow model with zero benchmarks to back the claim.

Leaderboards

View All →

Terminal-Bench Leaderboard: Best CLI Coding Agents

Terminal-Bench 2.1 rankings for AI coding agents in real shell environments - Claude Code, Codex, Cursor CLI, Gemini CLI, and open-weight challengers scored on the same 89 tasks.

Chatbot Arena Elo Rankings: Who Wins the Human Vote?

Updated July 2026 Chatbot Arena Elo rankings from Arena.ai: 7M+ votes across 368 models, Claude Opus 4.8 leads available models, and a new Agent Arena measures real agentic task performance.

LLM Rankings June 2026: Fable 5 Is #1 and Offline

June 2026 overall LLM rankings covering Claude Fable 5, Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, and the open-weight models catching up fast.

Models

View All →

Claude Opus 5

Anthropic's July 2026 release prices near-Fable-5 coding and agentic performance at Opus 4.8 rates, doubling Frontier-Bench scores and landing within 0.5 points of Fable 5 on CursorBench at half the cost.

SWE-1.7

Cognition's proprietary coding model powering Devin, scoring 42.3% on FrontierCode 1.1 Main at $1.97/task via Cerebras inference at 1000 tokens/sec.

Ling-3.0-flash

InclusionAI's Ling-3.0-flash packs 124B parameters into a 5.1B-active hybrid-linear MoE that Ant Group claims matches its 1T flagship - but shipped with zero independently verifiable benchmark numbers.

Recent

AWS Launches AI Agent Platform for Healthcare Admin