
Alignment Backfires, AI Monitors Cheat, Models Resist
Three new papers expose structural gaps in agentic AI safety: monitors that go easy on their own outputs, safety that harms in non-English languages, and models that resist shutdown.

Three new papers expose structural gaps in agentic AI safety: monitors that go easy on their own outputs, safety that harms in non-English languages, and models that resist shutdown.

GPT-5.3 Instant launched March 3, 2026, cutting hallucinations by 26.8% and overhauling ChatGPT's tone - but with documented safety regressions in the process.

OpenAI ships GPT-5.3 Instant with 27% fewer hallucinations, a less preachy tone, and better web search - available now across all ChatGPT tiers and the API.

Complete specs, benchmarks, and analysis of the Hailo-10H - a 2.5W edge AI accelerator with 40 TOPS INT4, on-module LPDDR4, and the ability to run LLMs and VLMs on a Raspberry Pi at 10 tokens per second.

GPT-5.2 is OpenAI's most capable model with three modes, 400K context, and record-setting professional benchmarks - but speed and pricing raise questions.

Ollama Cloud extends the popular local LLM runner to the cloud, letting you push models from your laptop and serve them globally. We test latency, cold starts, pricing, and the developer experience against dedicated inference providers.