
Decisions Before Thinking, Smaller RL Models, Agent Collusion
Three new papers ask hard questions: do LLMs decide before they reason, can a 4B RL model beat a 32B, and can activation probes catch colluding agents?

Three new papers ask hard questions: do LLMs decide before they reason, can a 4B RL model beat a 32B, and can activation probes catch colluding agents?

New proofs show semantic memory must forget, SARL trains reasoning models without labels, and the Novelty Bottleneck explains why AI won't eliminate human work.

Three arXiv papers push AI agents further: metacognitive self-modification, milestone-based RL lifting Gemma3-12B from 6% to 43% on WebArena-Lite, and hybrid workflows cutting inference costs 19x.

MiniMax's new 2,300B MoE model tops the Artificial Analysis Intelligence Index and claims to run 30-50% of its own RL research workflow autonomously.

Cursor launches Composer 2, its first in-house coding model trained via RL on long-horizon tasks, scoring 73.7 on SWE-bench Multilingual at $0.50/M input tokens.

Three new arXiv papers tackle constitutional AI rule learning, sleeper agent defense for multi-agent pipelines, and skill-evolving reinforcement learning for math reasoning.