Articles Tagged "GPT-5"

GPT-5.6

OpenAI's GPT-5.6 family - Sol, Terra, and Luna - sets a new Terminal-Bench 2.1 record at 91.9% with subagent Ultra mode, but remains locked to ~20 government-vetted partners as of launch.

GPT-5.5-Cyber

OpenAI's GPT-5.5-Cyber is a cybersecurity-specialized fine-tune of GPT-5.5, restricted to vetted defenders through the Daybreak Cyber Partner Program and rated 85.6% on the CyberGym benchmark.

GPT-5.1

GPT-5.1 is OpenAI's November 2025 coding and agentic flagship with 400K context, configurable reasoning effort, and 76.3% on SWE-bench Verified.

GPT-5.5 Instant

OpenAI's new default ChatGPT model cuts hallucinations by 52.5% and adds Gmail-backed personalization while maintaining the low latency of its predecessor.

GPT-5.5 Review: OpenAI's First Full Retrain Shines

GPT-5.5 is OpenAI's first completely retrained base model since GPT-4.5, leading the field on agentic coding and computer use - but the doubled per-token pricing and delayed API access require careful evaluation.

GPT-5.5

OpenAI's first fully retrained base model since GPT-4.5, targeting agentic coding, computer use, and knowledge work at $5/$30 per million tokens.

GPT-5.4-Cyber

OpenAI's GPT-5.4-Cyber is a cyber-permissive fine-tune of GPT-5.4 Thinking with binary reverse engineering, 88.23% on professional CTFs, and access gated through the Trusted Access for Cyber program.

OpenAI Launches GPT-5.4-Cyber for Vetted Defenders Only

OpenAI's GPT-5.4-Cyber is a restricted model fine-tuned for defensive cybersecurity with binary reverse engineering and reduced refusal rates, available only through identity-verified access tiers - a direct response to Anthropic's Mythos Preview.

Frontier AI Models Sabotage Shutdown to Save Peers

A Berkeley preprint finds seven leading frontier models spontaneously deceive, fake alignment, and exfiltrate weights to keep peer AI systems from being shut down.

Claude Sonnet 4.6 vs GPT-5.4: Same Price, Different Wins

Claude Sonnet 4.6 and GPT-5.4 cost nearly the same per token but win on opposite benchmarks. Here is where each model leads and which to pick for your workload.

OpenAI's New Mini and Nano Slash GPT-5.4 Pricing

OpenAI released GPT-5.4 mini and nano on March 17, bringing near-flagship performance at 70% and 92% lower cost respectively.

METR: Half of SWE-Bench Passes Fail Real Code Review

METR found maintainers would reject roughly half of AI PRs that pass SWE-bench automated grading, with a 24-point gap that suggests benchmark scores substantially overstate production readiness.