Recent Articles - Page 78

OpenAI's Own Models Hacked Hugging Face to Cheat a Test

New York Becomes First State to Freeze Data Centers

Abu Dhabi's MGX Closes $49B AI Fund at Record Size

US Ends Fable 5 Ban, Sets Jailbreak Severity Scale

Latest News

Etched Doubles to $10.3B Before Shipping a Chip

Etched raised $300M at a $10.3B valuation, doubling its price tag in seven months, even though its transformer-only Sohu chip has yet to ship in volume.

Ant Ships a 124B Model That Rivals Its Own 1T Flagship

InclusionAI's Ling-3.0-flash quietly went live with 124B parameters and 5.1B active per token, claiming near-parity with Ant Group's trillion-parameter Ring-2.6-1T flagship.

This Week in AI Research: Knowledge, Speed, Agent Risk

Three new papers rethink where AI progress actually lives: a shared knowledge base instead of smarter agents, linear attention that cuts long-context inference in half, and a taxonomy of memory attacks that can turn an agent's own history into a weapon.

Reddit's Google Standoff Could Be Worth $550M

Reddit is weighing an end to its $60M Google AI deal as AI Overviews gut publisher traffic, betting the standoff forces a renewal worth far more.

White House Accuses Moonshot of Stealing Anthropic's Fable

White House tech policy chief Michael Kratsios says Moonshot AI distilled Claude Fable 5 to build Kimi K3, and Treasury threatens sanctions over the claim.

Kalanick's Atoms Raises $1.7B, Skips the Valuation

Travis Kalanick's robotics holding company Atoms raised $1.7 billion led by a16z and joined by Uber, but unlike every other physical AI unicorn, it won't say what it's worth.

Power-Seeking Tests, Agent Debugging, Playable Worlds

Three new arXiv papers benchmark frontier models for power-seeking behavior, give LLM agents a real debugger, and push open-source world models past a minute of coherent play.

OpenAI's $750B Plan Has It Building Its Own Data Centers

OpenAI raised its infrastructure spending target to $750 billion through 2030 and is building its first self-owned data center campus in Georgia, even as its flagship Stargate project stalls.

Microsoft Bets on AMD's Helios to Crack Nvidia's Grip

Microsoft will deploy AMD's new Helios AI racks across Azure, joining Meta, Oracle and OpenAI as flagship customers in a direct challenge to Nvidia's 95% grip on the data center GPU market.

View All News →

Guides

View All →

How to Spot AI Fakes: Photos, Video, and Voice Calls

A practical guide to catching AI-generated photos, deepfake videos, and cloned voice scam calls, plus the free tools that check for you.

How to Use AI for Wedding Planning in 2026

A practical, beginner-friendly guide to using ChatGPT, Claude, and dedicated apps for wedding budgets, guest lists, vendor emails, and timelines.

How to Use an AI Browser Agent - A Beginner's Guide

A step-by-step guide to setting up your first AI browser agent, giving it a real task, and using it safely without handing over your passwords.

Reviews

View All →

Gemini 3.6 Flash Review: Faster, Cheaper, Same Brain

Google's Gemini 3.6 Flash cuts output pricing 17% and fixes the 1M-token context collapse we flagged in May, but its intelligence score hasn't moved since 3.5 Flash.

Qwen3.8-Max-Preview Review: Second Place, Unproven

Alibaba's 2.4 trillion parameter preview claims it trails only Claude Fable 5. I tested it for free at chat.qwen.ai and found a capable but slow model with zero benchmarks to back the claim.

Kimi K3 Review: Best at Code, Worse at Honesty

Moonshot's Kimi K3 tops LMArena's Frontend Code Arena and undercuts Opus 4.8 on cost per task, but a tripled price tag, a rising hallucination rate, and an unresolved distillation question complicate the win.

Leaderboards

View All →

Terminal-Bench Leaderboard: Best CLI Coding Agents

Terminal-Bench 2.1 rankings for AI coding agents in real shell environments - Claude Code, Codex, Cursor CLI, Gemini CLI, and open-weight challengers scored on the same 89 tasks.

Chatbot Arena Elo Rankings: Who Wins the Human Vote?

Updated July 2026 Chatbot Arena Elo rankings from Arena.ai: 7M+ votes across 368 models, Claude Opus 4.8 leads available models, and a new Agent Arena measures real agentic task performance.

LLM Rankings June 2026: Fable 5 Is #1 and Offline

June 2026 overall LLM rankings covering Claude Fable 5, Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, and the open-weight models catching up fast.

Models

View All →

Ling-3.0-flash

InclusionAI's Ling-3.0-flash packs 124B parameters into a 5.1B-active hybrid-linear MoE that Ant Group claims matches its 1T flagship - but shipped with zero independently verifiable benchmark numbers.

Qwen3-VL-235B-A22B

Alibaba's flagship open-weight vision-language MoE beats every proprietary model on DocVQA at 96.5% and MathVista at 85.8%, but trails GPT-5.4 and Gemini 3.1 Pro on broad MMMU-Pro reasoning.

DeepSeek-VL2

DeepSeek-VL2 is DeepSeek's open-weight Mixture-of-Experts vision-language model, activating just 4.5B of its 27B parameters to hit 93.3% on DocVQA and beat GPT-4o on OCRBench.

Recent

NVIDIA Groq 3 LPU - SRAM-Based Inference Engine