Z.AI Will Ban Your Coding Plan For Non-Coding Use

If you bought a Z.AI GLM Coding Plan to run SillyTavern roleplay sessions, a custom chatbot, or any other non-coding workflow because the $6/mo pricing was absurdly cheap, the discount has just become a liability. Z.AI updated its usage policy for the Coding Plan and is now actively detecting and punishing non-coding use. The first hit is "high-intensity throttling." The third is a permanent account ban.

The recent wave of error 1302 "Too Many Requests" reports on opencode, SillyTavern, and similar tools is the detection system firing.

TL;DR

Z.AI's GLM Coding Plan usage policy now explicitly restricts the subscription to "Coding Scenarios" and enforces it automatically
Detected non-coding use triggers "high-intensity throttling, account suspension, or permanent ban"
Three or more violations result in account ban per the policy text
Error codes 1302 / 1303 surfacing across opencode, openclaw, Letta-code and other MCP-compatible clients are the enforcement response
This is the third tightening since January - after capacity caps (20% of previous sign-ups) and the February 11 pricing hike
If you're using a Coding Plan for anything other than coding, move the workload to pay-as-you-go API before your account gets flagged

What changed

The relevant sentences, pulled from the updated Usage Policy:

"The GLM Coding Plan is designed specifically for Coding Scenarios. If the system detects that the subscription is being used for requests clearly unrelated to coding scenarios, certain subscription benefits may be restricted."

And on escalation:

"If your account violates the Usage Rules, and triggers risk control rules, it may be subject to risk control measures, including high-intensity throttling, account suspension, or permanent ban. Violating the Usage Rules three or more times will result in an account ban."

Two things stand out. First, the detection is automatic - "the system detects" does the work, not a human reviewer. Second, the enforcement ladder has three rungs (throttle -> suspend -> ban) and a hard numeric trigger (three strikes) for the permanent tier.

The policy does not define "Coding Scenarios." It does not publish the detection criteria. It does not list prohibited use cases. It does publish an appeal path - a support email - but no SLA for review.

The error codes

What users are seeing when the policy fires:

Error	Meaning	Behavior
`1302`	Too Many Requests	Retries succeed briefly, then fail again after 10-15 seconds
`1303`	Same class, different throttle bucket	Reported by SillyTavern and opencode users
`1113`	Insufficient balance	Surfaces when the Coding Plan's "usage conditions" are considered unmet

The 1302 pattern is distinctive because it is not a standard HTTP 429. A standard 429 has a well-defined Retry-After header and the client waits the prescribed interval. 1302 instead loops - the first retry passes, the next fails, and tools that aren't tuned for the pattern enter infinite-cooldown states. The openclaw team tracked an infinite cooldown loop traced to ZAI provider "rate limit misclassification" that matches this profile.

In other words: the throttling is adversarial to downstream clients, not cooperative.

Who this affects

The Coding Plan was sold at $3 to $15 per month depending on tier. At those prices the plan was substantially cheaper than the Claude, GPT-5, or Gemini equivalents, and an entire cottage industry of non-coding deployments grew around it:

SillyTavern and RP frontends. Creative-writing and roleplay communities were routing GLM-4.6 / 4.7 / 5.1 through the Coding Plan for long-form character sessions.
Custom chatbot operators. Discord bots, Telegram bots, personal assistants - anything that needed cheap high-volume inference.
Translation and summarization pipelines. Batch jobs that would be expensive on the metered API became viable at coding-plan flat rates.
Agent harnesses not marked as "coding." Any OpenRouter-style setup that forwarded conversational traffic was vulnerable to misclassification even if the underlying intent was development-adjacent.

The SillyTavern community thread that surfaced the policy change is the most organized report, but the same behavior has been reproduced across opencode, Letta-code, and direct API callers whose traffic patterns don't match "IDE-mediated developer sends code."

The escalation pattern

Z.AI's tightening has been linear and telegraphed, which makes the current enforcement step less surprising than users are treating it:

Date	Action	Source
January 2026	New sign-ups throttled to 20% of prior levels; Zhipu cites "malicious use" and compute constraints	IndexBox
February 11, 2026	Coding Plan pricing increased; first-purchase discounts cut	@Zai_org
Early April 2026	GLM-5-Turbo limits tripled in the Coding Plan to absorb non-peak demand, signaling shift of creative-adjacent workloads off GLM-4.7
Week of April 14, 2026	Non-coding use detection + ban policy live; 1302 / 1303 errors spike across client communities	User reports

The pattern reads as Zhipu trying to protect its coding-specific margin in the face of a subscriber base that was, on the numbers, substantially not coding. The $3-$15/mo flat rate is only economic if most of the inference is the short-burst, IDE-style traffic the pricing was modeled on. Long-context roleplay and chat sessions don't fit that profile.

What to do now

Three practical recommendations for anyone running a non-coding workload against a GLM plan:

For active users

Stop routing non-coding traffic through the Coding Plan immediately. One strike is throttling. Three strikes is a permanent ban and Z.AI has not published a detailed appeals process.
Move to the metered API. Z.AI sells the same GLM-4.6 / 4.7 / 5.1 models pay-as-you-go through the standard developer API. It is more expensive per token but does not carry the coding-scenario restriction.
Expect higher latency under load. The April GLM-5-Turbo limit-tripling was explicitly for non-peak hours; peak-hour throttling on the Coding Plan continues.

For tool builders

Classify 1302 and 1303 as a policy-control error, not a rate limit. Standard exponential-backoff logic will cause an infinite-loop pattern against these codes.
Surface the policy message to users rather than silently retrying. Users who don't know they're being flagged can't stop.
If your integration advertises "GLM Coding Plan compatible," add a warning that the endpoint is coding-only per vendor TOS.

For potential subscribers

Do not buy or renew a GLM Coding Plan for any use case that isn't IDE-style code assistance. The flat-rate arbitrage is over.

What it signals

This is the second Chinese frontier lab this year to take direct action against non-intended use of a price-subsidized subscription tier. The broader read is that the $3-$15/mo Chinese coding plans that were widely celebrated as a cheap alternative to Claude Code were being cross-subsidized in ways the providers can't sustain, and the subsidy is being withdrawn.

Zhipu has other constraints worth recalling. The underlying models were trained on Huawei Ascend chips with no NVIDIA silicon, giving the company a different cost profile than its Western counterparts - and less margin to absorb unintended workloads. GLM-5 shipped open-weight, so the base models are still available for self-hosters; the Coding Plan is the specific commercial product under pressure, not the model family.

For developers using these plans for actual coding: the product still works, and our GLM-5.1 SWE-Bench Pro coverage showed the underlying model is genuinely competitive with frontier Western models. The policy change is not about coding quality - it's about the subscription being used for workloads the pricing never assumed.

For everyone else: the coding plan is not a chatbot subscription. Treat it accordingly.

Sources: