Gemini 3.1 Pro Is the Best Model You Can't Use: 99-Hour Lockouts, Phantom Quota Drain, and Broken Tool Calling
Four days after launch, Gemini 3.1 Pro's benchmark-topping performance is overshadowed by 90-hour lockouts for paying subscribers, quota draining while idle, and tool-calling bugs that break LangChain, n8n, and RooCode. Developers are switching to Claude.

Gemini 3.1 Pro launched on February 19 with the strongest benchmark numbers Google has ever posted. Its 77.1% on ARC-AGI-2 more than doubled its predecessor and beat both Claude Opus 4.6 and GPT-5.3 in abstract reasoning. Its LiveCodeBench Elo of 2887 is 448 points above GPT-5.2. At $2/M input tokens, it is the cheapest frontier model available.
Four days later, the Google AI Developer Forum is full of paying subscribers who can't use it.
TL;DR
| Issue | Details |
|---|---|
| Lockout duration | 90-99 hours reported by paid Pro subscribers |
| Phantom quota drain | Quota drops from 100% to 0% with no activity |
| Quota consumption rate | ~2x faster than Gemini 3.0 on identical workloads |
| Launch-day latency | Up to 104 seconds for basic inputs |
| Tool-calling broken in | LangChain4j, n8n, RooCode, Cursor |
| Google's response | Acknowledged "phased rollout," no fix timeline |
90-hour lockouts for paying customers
The most severe reports come from Google AI Pro subscribers -- people paying for premium access.
After using the model's 5-hour quota windows three times within 24 hours, users found themselves locked out for 90 hours. Others report 99-hour lockouts after a single hour of use:
"Just updated to 3.1 Pro and I'm already locked out for 99 hours after only an hour of use."
A separate thread documents a 3-day lockout where the system applied a hard weekly cap -- typically reserved for Free Tier accounts -- to a paying Pro subscriber after the 3.1 migration. Multiple users in the thread confirmed similar experiences, with lockouts ranging from 3 to 7 days.
Quota draining while idle
Beyond the lockouts, the quota system itself appears bugged. Multiple users report quota dropping from 100% to 0% with no corresponding activity. Reset timers show 35-50 hours. From the forum:
"Mine was at 100% and then 80% for quite a while, maybe 30 minutes of working, and then I sent one more prompt and it bumped to 0% all of a sudden."
Even when the model is reachable, the 5-hour quota is consumed roughly 2x faster with 3.1 Pro compared to 3.0 on identical workloads. A second thread confirms the pattern: the new model simply eats quota faster, with no explanation from Google.
Launch-day latency
Early reviewers documented the model taking up to 104 seconds for rudimentary inputs on launch day, with frequent timeout errors and "This model is currently experiencing high demand" messages. The API returned MODEL_CAPACITY_EXHAUSTED errors throughout the first 48 hours. In Cursor, the agent would get stuck on "Planning next moves" or "Taking longer than expected," frequently showing "Reconnecting" for minutes at a time.
Tool calling: broken across frameworks
For a model positioned as an agentic reasoning engine, the tool-calling bugs are the most damaging issue.
The root cause: when Gemini 3.1 Pro triggers a tool call, the API requires that a returned thoughtSignature field be passed back exactly in the next request. If missing or misplaced, the API returns a 400 error: "Function call is missing a thought_signature in functionCall parts." This breaks multi-turn tool use in every major framework that has attempted integration:
- LangChain4j: Multi-turn conversations impossible after any tool call
- n8n: AI Agent node crashes immediately on tool use
- RooCode: Tool calls fail with signature errors
- RooCode via OpenRouter: API handler strips content array but preserves
reasoning_details, causing every follow-up to fail
Additional bugs include an infinite "Working" loop in Canvas code mode where generation ran for 88 minutes without completing or returning an error, and persistent infinite thinking loops in Android Studio.
Developer sentiment on Hacker News is blunt:
"It's bad at using tools and tries to edit files in weird ways instead of using the provided text editing tools."
"Gemini is superb at incredibly hard stuff, but falls apart on some of the most basic things (like tool calling)."
Developers are switching to Claude
The reliability gap is pushing developers toward alternatives. From the same HN thread:
"So I've tried to adopt a plan-in-Gemini, execute-in-Claude approach, but while I'm doing that I might as well just stay in Claude."
"I have a paid Antigravity subscription and most of the time I use Claude models due to the exact issues you have pointed out."
A head-to-head comparison by Shipyard found Claude Code finished tasks faster with full autonomy while Gemini CLI required manual nudging and retries -- at higher cost ($7.06 vs $4.80 for equivalent tasks). "Claude simply generates better code with fewer issues."
The emerging consensus among developers using both models: use Claude 4.5 as the default for planning and reliable execution, bring in Gemini for multimodal tasks and UI work where its vision capabilities have an edge.
Google's response
Google acknowledged the phased rollout in a GitHub discussion on February 20, confirming that AI Ultra subscribers get full access while other tiers receive partial rollout "as capacity permits." No timeline was given for completing the rollout.
What is notably absent: any Google employee response on the multiple forum threads documenting 90-hour lockouts, phantom quota drain, or migration bugs. The threads contain only user-to-user troubleshooting. Google's generic quota policy states that "when there's a large increase in activity, we may change limits to maintain quality" and that specified rate limits are "not guaranteed."
A separate issue compounded the frustration: gemini-3-pro-preview became unreachable after the 3.1 announcement, forcing users to migrate to a model they couldn't reliably access, with no transition period.
The pattern is familiar. Google launches a model with benchmark numbers that lead on 12 of 18 evaluations. The technical achievement is real. The ARC-AGI-2 score is not disputed. But the infrastructure to serve it at scale is not ready, the tool-calling integration is not tested against the frameworks developers actually use, and the paying customers who are supposed to have priority access are the ones filing bug reports.
Benchmarks measure what a model can do. Quotas determine what it will do. Right now, for many paying users, the answer is nothing.
Sources:
- Gemini 3.1 Pro launch announcement - Google Blog
- ARC-AGI-2 score analysis - blockchain.news
- Benchmark breakdown - Digital Applied
- Pricing - Macaron
- 90-hour lockout - Google AI Forum
- 99-hour lockout bug - Google AI Forum
- 3-day migration lockout - Google AI Forum
- Phantom quota drain - Google AI Forum
- Faster quota drain - Google AI Forum
- Launch-day latency - Medium
- MODEL_CAPACITY_EXHAUSTED - GitHub
- LangChain4j tool-calling bug - GitHub
- n8n tool-calling bug - n8n Community
- RooCode tool-calling bug - GitHub
- RooCode OpenRouter bug - GitHub
- Cursor stuck bug - Cursor Forum
- Infinite working loop - Google AI Forum
- Android Studio infinite thinking - Google AI Forum
- Hacker News discussion
- Claude Code vs Gemini CLI - Shipyard
- Gemini 3 Pro vs Claude 4.5 - GLB GPT
- Leads most benchmarks, trails Opus 4.6 in some - Trending Topics
- Google quota policy - Google Support
- gemini-3-pro-preview unreachable - GitHub
