Articles Tagged "Reasoning"

GPT-5.6

GPT-5.6

OpenAI's GPT-5.6 family - Sol, Terra, and Luna - sets a new Terminal-Bench 2.1 record at 91.9% with subagent Ultra mode, but remains locked to ~20 government-vetted partners as of launch.

Grok 4.5

Grok 4.5

Grok 4.5 is xAI's 1.5-trillion-parameter V9 model in private beta at SpaceX and Tesla, with supplemental training on Cursor coding data and early evals claiming performance near Claude Opus 4.8.

Gemini 3.5 Pro

Gemini 3.5 Pro

Google DeepMind's upcoming flagship model with a 2M-token context window and Deep Think reasoning, announced at Google I/O 2026 and expected in July.

Grok 4.3 Review: xAI Bets on Price Over Prestige

Grok 4.3 Review: xAI Bets on Price Over Prestige

Grok 4.3 slashes prices by up to 83%, adds native video input and voice cloning, and carves out a credible position as the most cost-efficient frontier model - with real caveats on coding and latency.

VibeThinker-3B

VibeThinker-3B

WeiboAI's 3B dense reasoning model fine-tuned from Qwen2.5-Coder-3B, posting AIME 2026 scores that match DeepSeek V3.2 (671B) using the Spectrum-to-Signal training pipeline.

ERNIE 5.1

ERNIE 5.1

Baidu's ERNIE 5.1 is a text-focused MoE model that claims the top Chinese model slot on LMArena with 800B parameters built at 6% of comparable training costs.