Articles Tagged "ArXiv"

Science Agents, Jailbreak Defense, and Open-World Failures

Science Agents, Jailbreak Defense, and Open-World Failures

Three papers from today's arXiv: graph-native RL generates traceable scientific hypotheses, HARC defeats jailbreaks by coupling internal safety directions, and ICML 2026's OpenAgent shows how distributional shift breaks tool-use agents.

AI Diagnosis, Cache Efficiency, and Agent Security

AI Diagnosis, Cache Efficiency, and Agent Security

Three papers from today's arXiv: a 32B medical model beats DeepSeek-R1 in rare disease diagnosis, a KV cache method keeps 97% accuracy with 3% memory, and a new benchmark red-teams agentic AI systems.

VibeThinker-3B

VibeThinker-3B

WeiboAI's 3B dense reasoning model fine-tuned from Qwen2.5-Coder-3B, posting AIME 2026 scores that match DeepSeek V3.2 (671B) using the Spectrum-to-Signal training pipeline.