Migrating from OpenAI API to Google Gemini API

TL;DR

Yes, you can switch - Gemini's OpenAI compatibility layer needs only 3 code changes
Tool calling combined with structured output breaks with a 400 error on Gemini
You gain a free tier, 1M-token context windows, and 60-75% lower costs on most workloads
Low difficulty for basic completions; Medium if you rely heavily on tool use or batch uploads

Why Move from OpenAI to Gemini?

Cost is the most common reason. For a typical workload of 10 million tokens per month (split evenly between input and output), GPT-5 costs roughly $56 while Gemini 3 Flash runs about $17.50 - a 69% drop. If you're on Gemini 2.5 Flash, that same workload costs around $14.

The context window difference also matters. All current Gemini models offer 1 million input tokens as standard. OpenAI's GPT-5 family caps at 128K by default. For applications that need to process long documents, entire codebases, or hours of transcript, that gap is significant.

Google also offers a free tier with no credit card required - Gemini 2.5 Flash gives you 10 requests per minute and 250 requests per day at no charge. OpenAI has no equivalent. If you're prototyping or running a low-volume side project, the free tier alone makes Gemini worth testing.

The catch is that the switch isn't quite as clean as Google's marketing suggests. The OpenAI compatibility layer works well for simple completions and streaming. It gets complicated when you add tool calling, batch file uploads, or reasoning configuration.

Feature Parity Table

Feature	OpenAI	Gemini	Notes
Chat completions	`POST /v1/chat/completions`	`POST /v1/chat/completions`	Same path via compat layer
Native endpoint	`POST /v1/chat/completions`	`POST /v1beta/models/{model}:generateContent`	Different when using native SDK
Streaming	`stream: true` (SSE)	`stream: true` (SSE)	Direct equivalent
System prompt	`role: "system"` in messages	`role: "system"` in messages	Compatible via compat layer
Function/tool calling	`tools[]` with JSON schema	`tools[]` with JSON schema	Compatible via compat layer, with caveats
Tool + structured output together	Supported	Not supported (400 error)	Major breaking difference
Structured output	`response_format` with JSON schema	`response_format` with JSON schema	Supported, not with tools
Image input	URL or base64	URL or base64	Compatible
Audio input	Supported	Supported via `input_audio`	Compatible via compat layer
Video generation	Sora (separate API)	Veo 3.1 (async polling)	Different architecture
Embeddings	`POST /v1/embeddings`	`POST /v1/embeddings`	Compatible via compat layer
Batch API	Supported via OpenAI SDK	Partial - file upload requires native SDK	See gotchas
Reasoning/thinking	`reasoning_effort`	`reasoning_effort` mapped to thinking budget	Works but not 1:1
Prompt caching	`cache_control` (Assistants API)	`cached_content` via `extra_body`	Different implementation
Free tier	No	Yes (Gemini 2.5 Flash: 10 RPM)	Gemini-only advantage
Context window	128K (GPT-5 default)	1M tokens (all Gemini models)	Gemini advantage
PDF input	Not supported	Supported via native SDK	Gemini-only feature

API Mapping

The good news: if you're on OpenAI's Chat Completions API, you can point your existing code at Gemini with three changes. Update the base_url, swap in your Gemini API key from Google AI Studio, and change the model name.

Code on a laptop screen showing API integration work Moving to Gemini starts with three lines of config, not a full rewrite. Source: unsplash.com

Basic Chat Completion

Before (OpenAI):

from openai import OpenAI

client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    max_tokens=256,
    temperature=0.7
)

print(response.choices[0].message.content)

After (Gemini via OpenAI compat layer):

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_GEMINI_API_KEY",  # from aistudio.google.com
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

response = client.chat.completions.create(
    model="gemini-3-flash-preview",  # swap model name
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    max_tokens=256,
    temperature=0.7
)

print(response.choices[0].message.content)

The response shape is identical - response.choices[0].message.content works without changes.

JavaScript / TypeScript

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: "YOUR_GEMINI_API_KEY",
    baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/"
});

const response = await openai.chat.completions.create({
    model: "gemini-3-flash-preview",
    messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "What is the capital of France?" }
    ]
});

console.log(response.choices[0].message.content);

Streaming

Streaming works the same way through the compat layer:

stream = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[{"role": "user", "content": "Write a short poem."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

Function / Tool Calling

Tool calling is compatible but has one major restriction: you can't use tools and response_format together on Gemini. Pick one or the other.

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"
)

Don't add response_format to this call - it triggers a 400 error on Gemini. The workaround is to handle JSON structure in your system prompt instead.

Using Gemini-Specific Features via `extra_body`

Features unique to Gemini pass through the extra_body parameter:

# Enable context caching
response = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[...],
    extra_body={
        "cached_content": "projects/my-project/cachedContents/my-cache-id"
    }
)

# Configure thinking level for reasoning models
response = client.chat.completions.create(
    model="gemini-3-1-pro-preview",  # reasoning model
    messages=[...],
    extra_body={
        "thinking_config": {
            "thinking_level": "high"  # low, medium, or high
        }
    }
)

Model Name Mapping

OpenAI Model	Gemini Equivalent	Context	Notes
gpt-5	gemini-3-1-pro-preview	1M tokens	Similar capability tier
gpt-5 (standard)	gemini-3-flash-preview	1M tokens	Much cheaper, still strong
gpt-4.1	gemini-2.5-flash	1M tokens	Good balance of cost and quality
gpt-4o-mini	gemini-2.5-flash-lite	1M tokens	Budget option
text-embedding-3-large	gemini-embedding-2-preview	Multimodal	Text + image + video embedding

Model IDs for the compat layer use hyphens: gemini-3-flash-preview, gemini-3-1-pro-preview, gemini-2.5-flash, gemini-2.5-flash-lite. You can list all available models with client.models.list().

Pricing Impact

The cost difference is the clearest reason to migrate. Both APIs charge per million tokens for input and output.

AI models compared on a developer's screen Pricing differences across model tiers add up fast at production scale. Source: unsplash.com

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-5 (OpenAI)	$1.25	$10.00
GPT-5.4 (OpenAI)	$2.50	$15.00
GPT-4.1 (OpenAI)	$2.00	$8.00
GPT-4o (OpenAI)	$2.50	$10.00
Gemini 3.1 Pro Preview	$2.00	$12.00
Gemini 3 Flash Preview	$0.50	$3.00
Gemini 2.5 Flash	$0.30	$2.50
Gemini 2.5 Flash-Lite	$0.10	$0.40

Sample workload: 10M tokens/month (5M input, 5M output)

Stack	Monthly Cost
GPT-5	$56.25
GPT-4.1	$50.00
Gemini 3.1 Pro Preview	$70.00
Gemini 3 Flash	$17.50
Gemini 2.5 Flash	$14.00
Gemini 2.5 Flash-Lite	$2.50

A few things to note here. Gemini 3.1 Pro is actually slightly more expensive than GPT-5 for this workload - the cost advantage comes from the Flash tier. Swapping GPT-5 for Gemini 3 Flash is the most common migration path for cost reduction: similar quality for most tasks, 69% cheaper.

Gemini also offers a Batch API with a 50% discount on all paid models for async workloads. Gemini 2.5 Flash batch pricing drops to $0.15 input / $1.25 output per million tokens. The free tier gives you Gemini 2.5 Flash at 10 RPM and 250 daily requests with no payment required.

For more context on LLM pricing across providers, see our LLM API pricing comparison.

Known Gotchas

Tool calling + structured output = 400 error. Gemini doesn't let you use tools and response_format in the same request. The error message is for forced function calling, you must set 'tool_config.function_calling_config.mode' to ANY instead of populating 'response_mime_type' and 'response_schema' fields. Work around this by describing your JSON schema requirements in the system prompt when you need both.
Batch API file uploads require the native SDK. The genai Python SDK handles file uploads and downloads for batch jobs. You can't do it through the OpenAI-compat interface - client.files.upload() will fail. Use the native SDK for batch prep, then optionally process results through the compat layer.
reasoning_effort and thinking_config can't be used at the same time. OpenAI's reasoning_effort: "high" maps to Gemini's thinking_level: "high", but pass only one. Setting both causes an error.
Schema validation is stricter. Gemini rejects JSON tool schemas with unrecognized fields (e.g., Unknown name 'type' at 'tools[0].function'). If you've been sloppy with extra metadata in your tool definitions, clean them up before migrating.
Free tier quotas dropped in December 2025. If you read older tutorials, the free limits they describe are no longer accurate. Current free tier for Gemini 2.5 Flash is 10 RPM and 250 requests per day - down from earlier limits. The official rate limits page in AI Studio shows your actual current quota.
Vertex AI SDK for Gemini is deprecated after June 2026. If you're using Gemini via the Vertex AI SDK (not the Gemini Developer API), migration to the Gen AI SDK is required before June 2026. New Gemini features won't ship to the Vertex AI SDK path.
Video generation uses async polling. Veo 3.1 video generation is asynchronous - you submit a job, then poll GET /v1/videos/{id} until completion. OpenAI's Sora also requires polling, so this won't surprise former Sora users, but it does mean you can't treat it like a synchronous completion call.
Gemini 3 Pro Preview was deprecated March 9, 2026. If you see tutorials referencing gemini-3-pro-preview (without the 3.1), that model is gone. Use gemini-3-1-pro-preview instead.

FAQ

Can I use the same OpenAI Python or JS library?

Yes - set base_url to https://generativelanguage.googleapis.com/v1beta/openai/ and use your Gemini API key. No new SDK required for basic completions.

Will my existing prompts work without changes?

Usually yes for conversational prompts. System instructions, few-shot examples, and user messages all transfer cleanly. Prompts fine-tuned for GPT-5's reasoning style may need adjustment for Flash models.

Is there a free tier?

Yes. Gemini 2.5 Flash is free at 10 requests per minute and 250 requests per day. Gemini 2.5 Flash-Lite is free at 15 RPM and 1,000 requests per day. OpenAI has no equivalent free tier.

Does Gemini support the OpenAI Assistants API format?

No. The Assistants API is a separate product OpenAI is deprecating anyway (August 2026). If you're migrating from Assistants, consider rebuilding with Gemini's native generateContent endpoint or an agentic framework like LangChain or LlamaIndex.

How do I get a Gemini API key?

Go to Google AI Studio, sign in with a Google account, and click "Get API key". Free-tier keys are available without billing information.

Is Gemini's OpenAI compatibility still beta?

As of April 2026, yes - Google labels it beta. Core features work reliably, but some edge cases (complex tool schemas, batch file management) still require the native SDK.