Articles Tagged "Research"

AI2 Fires Up $152M Blackwell Cluster for Open Science

AI2's federally backed OMAI compute cluster is now running on NVIDIA Blackwell Ultra hardware and has already shipped OLMo, Molmo 2, and MolmoAct models fully open to researchers.

Agent Overload, Blind Attention, Unsafe Traces

Three new papers show that more agent components backfire, reasoning models hide unsafe thinking, and vision-language models waste most of their attention.

Runtime Safety, Alignment Gaps, and Elastic Context

Three new papers deliver a runtime safety firewall for agent tools, challenge how we measure AI alignment, and introduce elastic context management for long-horizon search agents.

Agent Memory in 2026: Circuits, Tiers, Evolution

Three new papers reveal how agent memory silently breaks, how a tiered architecture recovers it, and how models can self-improve without human labels.

Misalignment Geometry, LLM Math, and How Llama Counts

Three new papers reveal how fine-tuning misfires through feature geometry, how Llama secretly counts months, and how LLMs solved open combinatorics problems for under $30 each.

Mayo Clinic AI Spots Pancreatic Cancer 3 Years Early

REDMOD, Mayo Clinic's radiomics AI, detects 73% of pancreatic cancers in CT scans that look normal to radiologists - nearly double the rate specialists achieve.

Tool-Use Tax, Jailbreak Risk, and Robot Vision

Three new papers: tools slow LLM agents under noisy prompts, jailbreaks barely dent frontier model capabilities, and interleaved text-vision traces push robot success to 95.5%.

OpenAI o1 Outperforms ER Doctors in Harvard Trial

A peer-reviewed Science study puts OpenAI o1 through 76 live emergency room cases - and the model beats expert physicians on initial triage with 67.1% accuracy against 55% and 50%.

Prompt Traps, Swarm Failures, and AI-Discovered Physics

Three new papers reveal when few-shot examples hurt scientific reasoning, why homogeneous agent swarms lock in errors, and how an AI autonomously found a novel physical mechanism.

Async RL Speedups, Unsafe Robots, and Routing Math

Three papers: 2-4x async RL training speedup, alarming 54.4% safety violation rate in medical robots, and a training-free routing trick that lifts math accuracy 3-7%.

Self-Correction Traps, Agent Deception, Scale Gaps

Three papers show LLM self-correction hurts above a key threshold, map AI deception with 14%-72% detection gaps, and prove million-agent societies fail without interaction depth.

Best AI Tools for Journalists 2026

A data-driven comparison of six AI tools journalists are actually using in 2026, covering transcription, research, fact-checking, and writing.

← Previous