Articles Tagged "Research"

Faster Agents, Skewed Evals, and Brand Bias in LLMs

Three new papers: agents that compile runs into 8-13x faster state machines, benchmark scores that shift with compute budget, and big brands monopolizing LLM recommendations.

AI Label Backlash - 60% of US Consumers Are Turned Off

Two major surveys from WordPress VIP and Gartner find a majority of US consumers are put off by AI labels in brand messaging, even as companies race to build AI visibility for search engines.

AI Engrams, Cognitive Debt, and Agent Trust

Three new papers tackle what lives inside a trained model, how AI dependence erodes human cognition, and whether AI teams can calibrate trust.

Anthropic Surveyed 52K Americans - Just 15% Trust AI

Anthropic's first Public Record survey of 51,993 Americans finds 64% fear job displacement, only 15% trust AI companies, and 70%+ support government regulation - with rare bipartisan consensus.

Agents Hit 89%, Evals Get a Schema, Memory Falls Short

Three papers from today's arXiv: workplace agents jumped from 43% to 89% task completion in two years, a 47-researcher coalition ships a unified eval schema, and agent memory only helps when similarity tops 0.8.

Tool Blindness, Tree Search, and the Road to ASI

Three new papers expose a 50-point gap in agent tool knowledge, show tree search tripling inference throughput, and map the research between AGI and superintelligence.

Honest AI is Provably Impossible - Plus Two Agent Wins

A new impossibility theorem proves feedback-based training can't guarantee honest AI, while two papers cut agent memory costs 78% and multi-agent latency 7x.

Context Overload, Memory Leaks, and Agent Safety

Three new arXiv papers expose how context bloat tanks agent performance, agent memory bleeds private data, and misaligned behavior spreads through multi-agent systems.

MCP Exploit Risk, Sycophancy Scores, and Agent Self-Harm

New research reveals MCP error messages triple agent attack success rates, ranks eight models on sycophancy with Claude scoring best, and finds self-evolving agents make 30-42% false edits.

Safety Evals Break Under Attack, Agents Work 87% Faster

Three papers: strategic attack timing exposes gaps in AI control evaluations, Perplexity's agents slash task time by 87%, and Lean4 formal proofs make agent workflows more reliable.

AI Sabotage Blind Spots, Code Drift, and ZK Proofs

Three new arXiv papers expose how developers miss AI sabotage 94% of the time, why LLMs converge structurally in code evolution, and how ZK proofs could verify frontier AI training.

AI Attachment, Smarter Spending, and Cascading RAG Errors

Three new papers tackle how routine AI use quietly rewires emotional habits, how to spend compute where failures cost most, and why agentic RAG errors compound before anyone notices.

← Previous