Articles Tagged "Benchmarks"

Qwen3.6-27B

Qwen3.6-27B

Qwen3.6-27B is a 27B dense open-weight multimodal model from Alibaba that scores 77.2% on SWE-bench Verified - beating Alibaba's own 397B MoE - under Apache 2.0.

ZAYA1-8B

ZAYA1-8B

Zyphra's ZAYA1-8B is an 8.4B-parameter MoE reasoning model with only 760M active parameters that matches DeepSeek-R1-0528 on math and coding benchmarks while running at a fraction of the compute cost.

GPT-Realtime-2

GPT-Realtime-2

OpenAI's second-generation real-time audio model with GPT-5-class reasoning, 128K context, five reasoning levels, and parallel tool calling - now generally available in the Realtime API.

MiniMax M2.7

MiniMax M2.7 is a 230B MoE coding agent that handles 30-50% of MiniMax's own RL research workflow, scoring 56.22% on SWE-Pro and 78% on SWE-bench Verified at $0.30/M input tokens.

GPT-5.5 Instant

GPT-5.5 Instant

OpenAI's new default ChatGPT model cuts hallucinations by 52.5% and adds Gmail-backed personalization while maintaining the low latency of its predecessor.