
GPT-5.5 Instant
OpenAI's new default ChatGPT model cuts hallucinations by 52.5% and adds Gmail-backed personalization while maintaining the low latency of its predecessor.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

OpenAI's new default ChatGPT model cuts hallucinations by 52.5% and adds Gmail-backed personalization while maintaining the low latency of its predecessor.

GPT-5.5 is OpenAI's first completely retrained base model since GPT-4.5, leading the field on agentic coding and computer use - but the doubled per-token pricing and delayed API access require careful evaluation.

OpenAI's first fully retrained base model since GPT-4.5, targeting agentic coding, computer use, and knowledge work at $5/$30 per million tokens.

OpenAI's GPT-5.4-Cyber is a cyber-permissive fine-tune of GPT-5.4 Thinking with binary reverse engineering, 88.23% on professional CTFs, and access gated through the Trusted Access for Cyber program.

OpenAI's GPT-5.4-Cyber is a restricted model fine-tuned for defensive cybersecurity with binary reverse engineering and reduced refusal rates, available only through identity-verified access tiers - a direct response to Anthropic's Mythos Preview.

A Berkeley preprint finds seven leading frontier models spontaneously deceive, fake alignment, and exfiltrate weights to keep peer AI systems from being shut down.

Claude Sonnet 4.6 and GPT-5.4 cost nearly the same per token but win on opposite benchmarks. Here is where each model leads and which to pick for your workload.

OpenAI released GPT-5.4 mini and nano on March 17, bringing near-flagship performance at 70% and 92% lower cost respectively.

METR found maintainers would reject roughly half of AI PRs that pass SWE-bench automated grading, with a 24-point gap that suggests benchmark scores substantially overstate production readiness.

GPT-5.4 brings native computer use, a 1M token context window, and serious coding muscle to OpenAI's mainline model - but at a premium price.

GPT-5.4 leads on computer use and enterprise productivity. Gemini 3.1 Pro leads on science reasoning and math at 20% lower cost. A benchmark-by-benchmark comparison.

GPT-5.4 leads on computer use and enterprise productivity at half the price. Claude Opus 4.6 leads on coding, agent teams, and long-context retrieval. Here is where each model wins.