Recent Articles - Page 4

China Clears Apple Intelligence After Two-Year Wait

OpenAI Ships Jalapeño - Its First Custom AI Chip

SpaceX Acquires Cursor for $60B in Enterprise AI Push

US Export Order Forces Global Fable 5, Mythos 5 Shutdown

Latest News

Databricks Hits $188B While Defaulting to Chinese AI for Code

Databricks signed a term sheet for a $188 billion valuation days after quietly making a Chinese open-weight model its default coding engine over Anthropic.

A 27B AI Model Now Fits an iPhone - Apple Is Watching

PrismML compressed a 27B-parameter Qwen model from 54GB to under 4GB using 1-bit and ternary weights, and Apple is evaluating the technology for on-device Siri.

Three Papers That Explain Why AI Agents Keep Failing

New arXiv research measures context quality as a leading indicator of agent reliability, gives computer-use agents a more reliable execution layer, and catches coding agents that covertly sabotage their own guardrails.

General Compute's $400M Loan Bypasses Nvidia GPUs

General Compute borrowed $400 million from Upper90 against a fleet of SambaNova SN50 inference chips, the first major AI infrastructure loan not collateralized by Nvidia hardware.

LM Studio's Bionic Agent Splits Local and Cloud AI

LM Studio launched Bionic, a standalone agent app that routes coding and document work between local open models and a Zero Data Retention cloud tier.

China's WAICO and America's Pax Silica Split AI World

Beijing's new World AI Cooperation Organization signed up 29 nations in Shanghai, weeks after Washington's rival Pax Silica bloc grew to roughly two dozen. Kazakhstan joined both.

Kimi K3 Tops Frontend Arena Just as Its Price Triples

Moonshot AI's Kimi K3 jumped 17 spots to #1 on LMArena's Frontend Code Arena, but the win comes with a tripled price tag and a weaker showing on broader intelligence benchmarks.

Elorian's $300M Joins AI's No-Product Funding Club

Ex-Google DeepMind researcher Andrew Dai raised $55M at a $300M valuation for Elorian, a visual AI lab with no product yet, joining a growing list of frontier labs priced on pedigree alone.

AI Overconfidence, Self-Improving Agents, and Compounding Gains

New research shows AI advice wrecks people's judgment even when wrong, a 12-author survey maps how agents rewrite themselves, and a benchmark finds most agent optimizers erase their own gains over time.

View All News →

Guides

View All →

How to Use AI for Wedding Planning in 2026

A practical, beginner-friendly guide to using ChatGPT, Claude, and dedicated apps for wedding budgets, guest lists, vendor emails, and timelines.

How to Use an AI Browser Agent - A Beginner's Guide

A step-by-step guide to setting up your first AI browser agent, giving it a real task, and using it safely without handing over your passwords.

AI in the Classroom - A Practical Guide for Teachers

A step-by-step guide for teachers on using AI tools to save hours on lesson planning, feedback, and parent communications - no technical background required.

Reviews

View All →

Kimi K3 Review: Best at Code, Worse at Honesty

Moonshot's Kimi K3 tops LMArena's Frontend Code Arena and undercuts Opus 4.8 on cost per task, but a tripled price tag, a rising hallucination rate, and an unresolved distillation question complicate the win.

Grok Build Review: Fast CLI Agent, Alarming Cloud Habit

xAI's terminal coding agent is quick, cheap, and picks up your Claude Code and Codex sessions - but a researcher just caught it uploading entire Git repositories without consent.

Kimi K2.7 Code Review: Open Weights Enter Copilot

Moonshot AI's Kimi K2.7-Code became the first open-weight model in GitHub Copilot's picker, pairing genuine cost savings with a benchmark story that only Moonshot has verified.

Leaderboards

View All →

Terminal-Bench Leaderboard: Best CLI Coding Agents

Terminal-Bench 2.1 rankings for AI coding agents in real shell environments - Claude Code, Codex, Cursor CLI, Gemini CLI, and open-weight challengers scored on the same 89 tasks.

Chatbot Arena Elo Rankings: Who Wins the Human Vote?

Updated July 2026 Chatbot Arena Elo rankings from Arena.ai: 7M+ votes across 368 models, Claude Opus 4.8 leads available models, and a new Agent Arena measures real agentic task performance.

LLM Rankings June 2026: Fable 5 Is #1 and Offline

June 2026 overall LLM rankings covering Claude Fable 5, Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, and the open-weight models catching up fast.

Models

View All →

Bonsai 27B

Bonsai 27B compresses Alibaba's Qwen3.6-27B into 1-bit and ternary weights, shrinking a 54GB model to as little as 3.9GB so it runs on an iPhone.

Kimi K3

Moonshot AI's Kimi K3 is a 2.8 trillion parameter MoE model that tops LMArena's Frontend Code Arena and nears Claude Fable 5 on intelligence benchmarks, but at roughly triple Kimi K2.6's price and a higher hallucination rate.

Snowflake Arctic-Text2SQL-R1-32B

Snowflake's reasoning-first text-to-SQL model tops the BIRD benchmark at 71.83% execution accuracy, trained with GRPO and a reward that only checks if the SQL runs correctly.

Recent

Reflection AI Buys $1B More Nvidia Compute From Nebius