Articles Tagged "AI Agents"

AI Sabotage Blind Spots, Code Drift, and ZK Proofs

Three new arXiv papers expose how developers miss AI sabotage 94% of the time, why LLMs converge structurally in code evolution, and how ZK proofs could verify frontier AI training.

NVIDIA Drops 110 Open-Source Skills for Physical AI Devs

NVIDIA's Agent Toolkit lands 110+ verified skills on GitHub covering robotics, autonomous vehicles, vision AI, and industrial systems - turning complex physical AI pipelines into single agent calls.

When to Stop - Overthinking, Handoffs, and Abstention

Three new papers show that AI agents fail not by doing the wrong thing, but by doing things when they should have stopped.

Best AI Coding Agents 2026: 6 Tools Tested and Ranked

A benchmark-driven comparison of Claude Code, Kiro, Devin, OpenAI Codex, Windsurf, and OpenHands - the six coding agents worth using in 2026.

Microsoft ASSERT Converts AI Policies Into Test Suites

Microsoft's open-source ASSERT framework turns natural language behavior specs into executable, auditable test suites for AI agents and LLM applications.

Reasoning Leaks, Hard Limits, and Self-Aware LLMs

Three new papers expose how reasoning traces can be extracted from supposedly hidden model internals, where chain-of-thought hits an architectural ceiling, and how RL teaches models to know when to quit.

New Open Standard Puts AI Agents Under Runtime Control

The Agent Control Standard defines open middleware hooks that let teams block, allow, or modify AI agent actions before they reach production systems.

Nvidia Enters the PC Market With RTX Spark Superchip

Nvidia's RTX Spark packs 20 Arm CPU cores and a Blackwell 2.0 GPU with 6,144 CUDA cores into a 45-80W Windows laptop chip, targeting Apple Silicon head-on.

Cut CoT Costs, Fix Agent Memory, Test Clinical AI

Three papers: smarter CoT trimming cuts reasoning length by 50%, a plug-in context manager rescues frozen agents on long tasks, and a 960K-item clinical benchmark exposes LLM gaps in hospitals.

Antigravity 2.0 Review: Agent-First, Rocky Launch

Google's Antigravity 2.0 rewrites the platform from a browser IDE into a five-surface agent suite. The architecture is ambitious, the launch was a mess.

BadHost: The Auth Bypass Lurking in 325M AI Systems

CVE-2026-48710 in Starlette lets a single malformed HTTP header bypass authentication on vLLM, LiteLLM, FastAPI, and every MCP server in production.

Kore.ai Artemis Review: Enterprise Agent Control Plane

Kore.ai's Artemis platform brings a compiled blueprint language and governance-first architecture to enterprise multiagent AI - ambitious, but Azure-only for now.

← Previous