Migrating from OpenAI API to Google Gemini API
A practical guide to switching from OpenAI's chat completions to Google's Gemini API, covering the 3-line compatibility shortcut, key schema differences, and where the two APIs diverge.

TL;DR
- Yes, you can switch - Gemini's OpenAI compatibility layer needs only 3 code changes
- Tool calling combined with structured output breaks with a 400 error on Gemini
- You gain a free tier, 1M-token context windows, and 60-75% lower costs on most workloads
- Low difficulty for basic completions; Medium if you rely heavily on tool use or batch uploads
Why Move from OpenAI to Gemini?
Cost is the most common reason. For a typical workload of 10 million tokens per month (split evenly between input and output), GPT-5 costs roughly $56 while Gemini 3 Flash runs about $17.50 - a 69% drop. If you're on Gemini 2.5 Flash, that same workload costs around $14.
The context window difference also matters. All current Gemini models offer 1 million input tokens as standard. OpenAI's GPT-5 family caps at 128K by default. For applications that need to process long documents, entire codebases, or hours of transcript, that gap is significant.
Google also offers a free tier with no credit card required - Gemini 2.5 Flash gives you 10 requests per minute and 250 requests per day at no charge. OpenAI has no equivalent. If you're prototyping or running a low-volume side project, the free tier alone makes Gemini worth testing.
The catch is that the switch isn't quite as clean as Google's marketing suggests. The OpenAI compatibility layer works well for simple completions and streaming. It gets complicated when you add tool calling, batch file uploads, or reasoning configuration.
Feature Parity Table
| Feature | OpenAI | Gemini | Notes |
|---|---|---|---|
| Chat completions | POST /v1/chat/completions | POST /v1/chat/completions | Same path via compat layer |
| Native endpoint | POST /v1/chat/completions | POST /v1beta/models/{model}:generateContent | Different when using native SDK |
| Streaming | stream: true (SSE) | stream: true (SSE) | Direct equivalent |
| System prompt | role: "system" in messages | role: "system" in messages | Compatible via compat layer |
| Function/tool calling | tools[] with JSON schema | tools[] with JSON schema | Compatible via compat layer, with caveats |
| Tool + structured output together | Supported | Not supported (400 error) | Major breaking difference |
| Structured output | response_format with JSON schema | response_format with JSON schema | Supported, not with tools |
| Image input | URL or base64 | URL or base64 | Compatible |
| Audio input | Supported | Supported via input_audio | Compatible via compat layer |
| Video generation | Sora (separate API) | Veo 3.1 (async polling) | Different architecture |
| Embeddings | POST /v1/embeddings | POST /v1/embeddings | Compatible via compat layer |
| Batch API | Supported via OpenAI SDK | Partial - file upload requires native SDK | See gotchas |
| Reasoning/thinking | reasoning_effort | reasoning_effort mapped to thinking budget | Works but not 1:1 |
| Prompt caching | cache_control (Assistants API) | cached_content via extra_body | Different implementation |
| Free tier | No | Yes (Gemini 2.5 Flash: 10 RPM) | Gemini-only advantage |
| Context window | 128K (GPT-5 default) | 1M tokens (all Gemini models) | Gemini advantage |
| PDF input | Not supported | Supported via native SDK | Gemini-only feature |
API Mapping
The good news: if you're on OpenAI's Chat Completions API, you can point your existing code at Gemini with three changes. Update the base_url, swap in your Gemini API key from Google AI Studio, and change the model name.
Moving to Gemini starts with three lines of config, not a full rewrite.
Source: unsplash.com
Basic Chat Completion
Before (OpenAI):
from openai import OpenAI
client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
model="gpt-5",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
max_tokens=256,
temperature=0.7
)
print(response.choices[0].message.content)
After (Gemini via OpenAI compat layer):
from openai import OpenAI
client = OpenAI(
api_key="YOUR_GEMINI_API_KEY", # from aistudio.google.com
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
response = client.chat.completions.create(
model="gemini-3-flash-preview", # swap model name
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
max_tokens=256,
temperature=0.7
)
print(response.choices[0].message.content)
The response shape is identical - response.choices[0].message.content works without changes.
JavaScript / TypeScript
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: "YOUR_GEMINI_API_KEY",
baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/"
});
const response = await openai.chat.completions.create({
model: "gemini-3-flash-preview",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" }
]
});
console.log(response.choices[0].message.content);
Streaming
Streaming works the same way through the compat layer:
stream = client.chat.completions.create(
model="gemini-3-flash-preview",
messages=[{"role": "user", "content": "Write a short poem."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
Function / Tool Calling
Tool calling is compatible but has one major restriction: you can't use tools and response_format together on Gemini. Pick one or the other.
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
]
response = client.chat.completions.create(
model="gemini-3-flash-preview",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
tool_choice="auto"
)
Don't add response_format to this call - it triggers a 400 error on Gemini. The workaround is to handle JSON structure in your system prompt instead.
Using Gemini-Specific Features via extra_body
Features unique to Gemini pass through the extra_body parameter:
# Enable context caching
response = client.chat.completions.create(
model="gemini-3-flash-preview",
messages=[...],
extra_body={
"cached_content": "projects/my-project/cachedContents/my-cache-id"
}
)
# Configure thinking level for reasoning models
response = client.chat.completions.create(
model="gemini-3-1-pro-preview", # reasoning model
messages=[...],
extra_body={
"thinking_config": {
"thinking_level": "high" # low, medium, or high
}
}
)
Model Name Mapping
| OpenAI Model | Gemini Equivalent | Context | Notes |
|---|---|---|---|
| gpt-5 | gemini-3-1-pro-preview | 1M tokens | Similar capability tier |
| gpt-5 (standard) | gemini-3-flash-preview | 1M tokens | Much cheaper, still strong |
| gpt-4.1 | gemini-2.5-flash | 1M tokens | Good balance of cost and quality |
| gpt-4o-mini | gemini-2.5-flash-lite | 1M tokens | Budget option |
| text-embedding-3-large | gemini-embedding-2-preview | Multimodal | Text + image + video embedding |
Model IDs for the compat layer use hyphens: gemini-3-flash-preview, gemini-3-1-pro-preview, gemini-2.5-flash, gemini-2.5-flash-lite. You can list all available models with client.models.list().
Pricing Impact
The cost difference is the clearest reason to migrate. Both APIs charge per million tokens for input and output.
Pricing differences across model tiers add up fast at production scale.
Source: unsplash.com
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-5 (OpenAI) | $1.25 | $10.00 |
| GPT-5.4 (OpenAI) | $2.50 | $15.00 |
| GPT-4.1 (OpenAI) | $2.00 | $8.00 |
| GPT-4o (OpenAI) | $2.50 | $10.00 |
| Gemini 3.1 Pro Preview | $2.00 | $12.00 |
| Gemini 3 Flash Preview | $0.50 | $3.00 |
| Gemini 2.5 Flash | $0.30 | $2.50 |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 |
Sample workload: 10M tokens/month (5M input, 5M output)
| Stack | Monthly Cost |
|---|---|
| GPT-5 | $56.25 |
| GPT-4.1 | $50.00 |
| Gemini 3.1 Pro Preview | $70.00 |
| Gemini 3 Flash | $17.50 |
| Gemini 2.5 Flash | $14.00 |
| Gemini 2.5 Flash-Lite | $2.50 |
A few things to note here. Gemini 3.1 Pro is actually slightly more expensive than GPT-5 for this workload - the cost advantage comes from the Flash tier. Swapping GPT-5 for Gemini 3 Flash is the most common migration path for cost reduction: similar quality for most tasks, 69% cheaper.
Gemini also offers a Batch API with a 50% discount on all paid models for async workloads. Gemini 2.5 Flash batch pricing drops to $0.15 input / $1.25 output per million tokens. The free tier gives you Gemini 2.5 Flash at 10 RPM and 250 daily requests with no payment required.
For more context on LLM pricing across providers, see our LLM API pricing comparison.
Known Gotchas
Tool calling + structured output = 400 error. Gemini doesn't let you use
toolsandresponse_formatin the same request. The error message isfor forced function calling, you must set 'tool_config.function_calling_config.mode' to ANY instead of populating 'response_mime_type' and 'response_schema' fields. Work around this by describing your JSON schema requirements in the system prompt when you need both.Batch API file uploads require the native SDK. The
genaiPython SDK handles file uploads and downloads for batch jobs. You can't do it through the OpenAI-compat interface -client.files.upload()will fail. Use the native SDK for batch prep, then optionally process results through the compat layer.reasoning_effortandthinking_configcan't be used at the same time. OpenAI'sreasoning_effort: "high"maps to Gemini'sthinking_level: "high", but pass only one. Setting both causes an error.Schema validation is stricter. Gemini rejects JSON tool schemas with unrecognized fields (e.g.,
Unknown name 'type' at 'tools[0].function'). If you've been sloppy with extra metadata in your tool definitions, clean them up before migrating.Free tier quotas dropped in December 2025. If you read older tutorials, the free limits they describe are no longer accurate. Current free tier for Gemini 2.5 Flash is 10 RPM and 250 requests per day - down from earlier limits. The official rate limits page in AI Studio shows your actual current quota.
Vertex AI SDK for Gemini is deprecated after June 2026. If you're using Gemini via the Vertex AI SDK (not the Gemini Developer API), migration to the Gen AI SDK is required before June 2026. New Gemini features won't ship to the Vertex AI SDK path.
Video generation uses async polling. Veo 3.1 video generation is asynchronous - you submit a job, then poll
GET /v1/videos/{id}until completion. OpenAI's Sora also requires polling, so this won't surprise former Sora users, but it does mean you can't treat it like a synchronous completion call.Gemini 3 Pro Preview was deprecated March 9, 2026. If you see tutorials referencing
gemini-3-pro-preview(without the3.1), that model is gone. Usegemini-3-1-pro-previewinstead.
FAQ
Can I use the same OpenAI Python or JS library?
Yes - set base_url to https://generativelanguage.googleapis.com/v1beta/openai/ and use your Gemini API key. No new SDK required for basic completions.
Will my existing prompts work without changes?
Usually yes for conversational prompts. System instructions, few-shot examples, and user messages all transfer cleanly. Prompts fine-tuned for GPT-5's reasoning style may need adjustment for Flash models.
Is there a free tier?
Yes. Gemini 2.5 Flash is free at 10 requests per minute and 250 requests per day. Gemini 2.5 Flash-Lite is free at 15 RPM and 1,000 requests per day. OpenAI has no equivalent free tier.
Does Gemini support the OpenAI Assistants API format?
No. The Assistants API is a separate product OpenAI is deprecating anyway (August 2026). If you're migrating from Assistants, consider rebuilding with Gemini's native generateContent endpoint or an agentic framework like LangChain or LlamaIndex.
How do I get a Gemini API key?
Go to Google AI Studio, sign in with a Google account, and click "Get API key". Free-tier keys are available without billing information.
Is Gemini's OpenAI compatibility still beta?
As of April 2026, yes - Google labels it beta. Core features work reliably, but some edge cases (complex tool schemas, batch file management) still require the native SDK.
Sources
- OpenAI compatibility - Gemini API - Google AI for Developers
- Gemini 3 Developer Guide - Google AI for Developers
- Gemini API Models - Google AI for Developers
- Gemini API Changelog - Google AI for Developers
- OpenAI API Changelog
- Tool calling with Gemini + structured output issue - GitHub
- Structured Output + Tool Calling not working with Gemini - GitHub
- Gemini API Pricing 2026 - MetaCTO
- GPT-5 API Pricing 2026 - PricePerToken
- Gemini API Rate Limits - Google AI for Developers
✓ Last verified April 5, 2026
