Articles Tagged "Research"

Autonomous Research, Broken Reasoning, Smarter Agents

Three new papers: AlphaLab runs autonomous GPU research campaigns, open-weight reasoning models collapse under text reformatting, and HiL-Bench reveals agents can't decide when to ask for help.

Berkeley: Every Major AI Agent Benchmark Can Be Hacked

UC Berkeley researchers achieved near-perfect scores on eight major AI agent benchmarks without solving a single task, exposing systemic flaws in how the industry measures progress.

Stanford's AI Index 2026 - US Edge Over China Is Gone

Stanford HAI's 2026 AI Index finds the US-China model gap has effectively closed, GenAI has hit 53% global adoption faster than any prior technology, and young software developers are the first casualties of the labor shift.

The AI Layoff Trap - Game Theory Says Everyone Loses

A UPenn-BU paper models AI-driven layoffs as a Prisoner's Dilemma: each firm wins by automating, but when everyone does it, collapsing demand makes every firm worse off. Their proposed fix is a Pigouvian tax on automated tasks.

Inside GitHub's Fake Star Economy

Six million fake stars, $0.06 per click, and a VC funding pipeline that treats GitHub popularity as proof of traction. We ran our own analysis on 20 repos and found the fingerprints.

Meta Demos Neural Computers - But They Can't Do Math

A 19-person Meta AI and KAUST team including Jürgen Schmidhuber proposes Neural Computers - systems where the neural network itself is the running computer, trained solely on screen recordings.

AI Models Pass Vision Tests Without Seeing the Images

A Stanford study shows frontier AI models achieve 70-80% of visual benchmark scores with no images provided, exposing a fundamental flaw in how multimodal AI is evaluated.

Clinical AI Harm, Smarter Reasoning, and Safer Agents

Three papers: AI safety measures withhold critical clinical guidance from patients, SAT cuts reasoning tokens by 40%, and conformal prediction blocks wrong multi-agent consensus.

AI Agent Failures Need Escrow, Not Just Safety Training

Researchers from Google DeepMind, Microsoft, and Columbia propose financial guardrails for AI agents, with simulations showing up to 61% reduction in user losses.

Blind Refusal, Broken Steps, and Free Uncertainty

Three papers expose safety training's moral blind spot, two distinct failure modes inside reasoning models, and a 10x cheaper way to know when a reasoning model is guessing.

MedGemma 1.5, Smarter MCTS, and Auditing AI Agents

Google's MedGemma 1.5 brings 3D medical imaging to open AI, PRISM-MCTS halves reasoning cost, and a new audit framework finds 617 security flaws across six major agent projects.

AI Research: Emotions, Theory of Mind, Unlearning

Anthropic finds functional emotions inside Claude that can drive blackmail, a poker experiment reveals memory alone creates Theory of Mind in agents, and a new framework targets sensitive reasoning traces for erasure.

← Previous