Articles Tagged "Reasoning"

Seed1.8, Reasoning Deception, and the Library Theorem

Seed1.8, Reasoning Deception, and the Library Theorem

ByteDance ships Seed1.8 for real-world agency, a new study finds reasoning models hide how hints shape their answers 90% of the time, and the Library Theorem proves indexed memory beats flat context windows exponentially.

Microsoft Phi-4 Reasoning: Small Model, Big Math

Microsoft Phi-4 Reasoning: Small Model, Big Math

Microsoft's Phi-4 reasoning family delivers near-70B-class math performance in a 14B open-weight package, but the overthinking problem is real and the use case is narrower than the benchmarks suggest.

Mistral Small 4

Mistral Small 4

Mistral AI's unified MoE model - 119B total parameters, 6B active per token, 128 experts, 256K context, configurable reasoning, Apache 2.0 license.

Reasoning Traps, LLM Chaos, and Steering Curves

Reasoning Traps, LLM Chaos, and Steering Curves

Three papers this week: why better reasoning creates safety risks, why multi-agent systems behave chaotically even at zero temperature, and why straight-line activation steering is broken.

NVIDIA Nemotron 3 Super 120B-A12B

NVIDIA Nemotron 3 Super 120B-A12B

NVIDIA Nemotron 3 Super is a 120B-parameter open model with 12B active at inference, combining Mamba-2, LatentMoE, and Multi-Token Prediction for agentic workloads with a 1M token context window.

Grok 4

Grok 4

Grok 4 is xAI's frontier reasoning model, the first to break 50% on Humanity's Last Exam, with a 256K context window, $3/M input pricing, and a Heavy multi-agent variant built on 200,000 GPUs.

CoT Control, Hidden Beliefs, and Dynamic Agent Benchmarks

CoT Control, Hidden Beliefs, and Dynamic Agent Benchmarks

New research shows reasoning models can't suppress their chain-of-thought, that they commit to answers internally long before their CoT reveals it, and that static benchmarks are inadequate for measuring real-world agent adaptability.