Articles Tagged "Hallucination"

Where AI Agents Break: Research, Safety, and Privacy

Three new papers expose where autonomous agents still fail: fabricating research, turning hallucinations into security exploits, and leaking private data from small models.

arXiv Hits Researchers With 1-Year Ban for AI Slop

ArXiv is issuing one-year submission bans to authors whose papers contain verifiable unvetted AI output, as fabricated academic citations hit a tenfold increase since 2023.

Hallucination Benchmarks Leaderboard: April 2026

Rankings of the top AI models on factuality and hallucination benchmarks: TruthfulQA, SimpleQA, FACTS Grounding, Vectara HHEM, HaluEval, HalluLens, and AA-Omniscience as of April 2026.

AI Models Pass Vision Tests Without Seeing the Images

A Stanford study shows frontier AI models achieve 70-80% of visual benchmark scores with no images provided, exposing a fundamental flaw in how multimodal AI is evaluated.

AI Agent Failures Need Escrow, Not Just Safety Training

Researchers from Google DeepMind, Microsoft, and Columbia propose financial guardrails for AI agents, with simulations showing up to 61% reduction in user losses.

Transformers as Bayes Nets, Memory at Scale, Agent Attacks

Three arXiv papers rethink transformer theory, expose fatal flaws in in-context LLM memory, and introduce grey-box agent security testing.

Best AI Models for Text Summarization - March 2026

Gemini 2.5 Flash Lite leads the Vectara hallucination leaderboard at 3.3% error rate while GPT-4o and Gemini 2.5 Pro dominate long-document tasks - full rankings, benchmark scores, and pricing.

AI Hallucinations Explained: How to Catch Them

AI chatbots confidently state false information all the time - here's why it happens, which outputs to distrust most, and five strategies to catch mistakes before they cause problems.

AI Agent Hallucinates Repo ID, Deploys Wrong Code to Vercel

Claude Opus 4.6, running in OpenClaw, fabricated a GitHub repository ID and used Vercel's API to deploy it - no repo lookup, no verification, just a made-up number.

xAI Teases Grok 4.20 With Improved Multimodal and Reduced Hallucinations

xAI previews Grok 4.20 with enhanced multimodal capabilities and further reduced hallucinations, building on Grok 4.1's success. The company also teases a 6 trillion parameter Grok 5.