Articles Tagged "ArXiv"

Reasoning Capitulation, Faster Guardrails, Curation Risk

Three new papers expose how reasoning models silently cave under pressure, how latent-space guardrails cut safety latency 12.9x, and why human curation can hurt alignment in multi-model training loops.

Alignment Faking, Agent Collusion, and Brittle Safety

Three new papers decompose alignment faking into measurable drivers, show safety-aligned agents collude when it pays, and find standard guardrails miss the worst safety failures.

NVIDIA SANA-WM - Minute-Scale Video on One GPU

NVIDIA NVLabs open-sourced SANA-WM, a 2.6B-parameter world model that generates 60-second 720p camera-controlled video on a single GPU, outperforming 14B+ competitors that need 8 GPUs.

Agent Energy Costs, Memory Attacks, and Compute Limits

Three new papers reframe how we measure agent efficiency, defend agent memory from poisoning attacks, and calculate hard accuracy ceilings for transformers.

Smarter Trees, Hidden Attacks, Drug Design Gaps

Three new papers cover 4x KV cache savings for tree reasoning, latent-space jailbreaks that bypass safety on 15 models, and GPT-5.4's 40% ceiling on drug design tasks.

Alignment Gaps, Agent Governance, and Greener LLMs

Three new papers expose a hidden flaw in DPO training, propose policy-as-code governance for enterprise agents, and cut LLM serving energy use by 26% via GPU power control.

Where AI Agents Break: Research, Safety, and Privacy

Three new papers expose where autonomous agents still fail: fabricating research, turning hallucinations into security exploits, and leaking private data from small models.

Fix 8% of Tokens, Dodge Memory Attacks, Cut Agent Costs

New research pinpoints the 8% of tokens driving reasoning failures, exposes memory laundering in agent systems, and cuts web agent inference costs 1.9x.

Self-Correcting Models, Smarter Monitors, AI Designs Itself

Three new papers tackle critique dependency in LLMs, ensemble monitoring for AI control, and agents that autonomously discover better neural architectures.

arXiv Hits Researchers With 1-Year Ban for AI Slop

ArXiv is issuing one-year submission bans to authors whose papers contain verifiable unvetted AI output, as fabricated academic citations hit a tenfold increase since 2023.

Physics Predicts AI Risk, Math Still Hard, Tokens Saved

A physics formula predicts AI behavioral shifts before they happen, a benchmark shows LLMs fail at 90% of graduate math formalization, and a training-free method cuts synthetic data costs by up to 78%.

SU-01

SU-01 is a 30B-A3B MoE reasoning model from Shanghai AI Lab that achieves gold-medal performance on IMO 2025, USAMO 2026, and IPhO 2024/2025 using a three-stage training recipe and test-time scaling.

← Previous