Migrating from OpenAI API to Google Gemini API

A practical guide to switching from OpenAI's chat completions to Google's Gemini API, covering the 3-line compatibility shortcut, key schema differences, and where the two APIs diverge.

From: OpenAI API To: Google Gemini API Difficulty: Low
Migrating from OpenAI API to Google Gemini API

TL;DR

  • Yes, you can switch - Gemini's OpenAI compatibility layer needs only 3 code changes
  • Tool calling combined with structured output breaks with a 400 error on Gemini
  • You gain a free tier, 1M-token context windows, and 60-75% lower costs on most workloads
  • Low difficulty for basic completions; Medium if you rely heavily on tool use or batch uploads

Why Move from OpenAI to Gemini?

Cost is the most common reason. For a typical workload of 10 million tokens per month (split evenly between input and output), GPT-5 costs roughly $56 while Gemini 3 Flash runs about $17.50 - a 69% drop. If you're on Gemini 2.5 Flash, that same workload costs around $14.

The context window difference also matters. All current Gemini models offer 1 million input tokens as standard. OpenAI's GPT-5 family caps at 128K by default. For applications that need to process long documents, entire codebases, or hours of transcript, that gap is significant.

Google also offers a free tier with no credit card required - Gemini 2.5 Flash gives you 10 requests per minute and 250 requests per day at no charge. OpenAI has no equivalent. If you're prototyping or running a low-volume side project, the free tier alone makes Gemini worth testing.

The catch is that the switch isn't quite as clean as Google's marketing suggests. The OpenAI compatibility layer works well for simple completions and streaming. It gets complicated when you add tool calling, batch file uploads, or reasoning configuration.

Feature Parity Table

FeatureOpenAIGeminiNotes
Chat completionsPOST /v1/chat/completionsPOST /v1/chat/completionsSame path via compat layer
Native endpointPOST /v1/chat/completionsPOST /v1beta/models/{model}:generateContentDifferent when using native SDK
Streamingstream: true (SSE)stream: true (SSE)Direct equivalent
System promptrole: "system" in messagesrole: "system" in messagesCompatible via compat layer
Function/tool callingtools[] with JSON schematools[] with JSON schemaCompatible via compat layer, with caveats
Tool + structured output togetherSupportedNot supported (400 error)Major breaking difference
Structured outputresponse_format with JSON schemaresponse_format with JSON schemaSupported, not with tools
Image inputURL or base64URL or base64Compatible
Audio inputSupportedSupported via input_audioCompatible via compat layer
Video generationSora (separate API)Veo 3.1 (async polling)Different architecture
EmbeddingsPOST /v1/embeddingsPOST /v1/embeddingsCompatible via compat layer
Batch APISupported via OpenAI SDKPartial - file upload requires native SDKSee gotchas
Reasoning/thinkingreasoning_effortreasoning_effort mapped to thinking budgetWorks but not 1:1
Prompt cachingcache_control (Assistants API)cached_content via extra_bodyDifferent implementation
Free tierNoYes (Gemini 2.5 Flash: 10 RPM)Gemini-only advantage
Context window128K (GPT-5 default)1M tokens (all Gemini models)Gemini advantage
PDF inputNot supportedSupported via native SDKGemini-only feature

API Mapping

The good news: if you're on OpenAI's Chat Completions API, you can point your existing code at Gemini with three changes. Update the base_url, swap in your Gemini API key from Google AI Studio, and change the model name.

Code on a laptop screen showing API integration work Moving to Gemini starts with three lines of config, not a full rewrite. Source: unsplash.com

Basic Chat Completion

Before (OpenAI):

from openai import OpenAI

client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    max_tokens=256,
    temperature=0.7
)

print(response.choices[0].message.content)

After (Gemini via OpenAI compat layer):

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_GEMINI_API_KEY",  # from aistudio.google.com
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

response = client.chat.completions.create(
    model="gemini-3-flash-preview",  # swap model name
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    max_tokens=256,
    temperature=0.7
)

print(response.choices[0].message.content)

The response shape is identical - response.choices[0].message.content works without changes.

JavaScript / TypeScript

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: "YOUR_GEMINI_API_KEY",
    baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/"
});

const response = await openai.chat.completions.create({
    model: "gemini-3-flash-preview",
    messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "What is the capital of France?" }
    ]
});

console.log(response.choices[0].message.content);

Streaming

Streaming works the same way through the compat layer:

stream = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[{"role": "user", "content": "Write a short poem."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

Function / Tool Calling

Tool calling is compatible but has one major restriction: you can't use tools and response_format together on Gemini. Pick one or the other.

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"
)

Don't add response_format to this call - it triggers a 400 error on Gemini. The workaround is to handle JSON structure in your system prompt instead.

Using Gemini-Specific Features via extra_body

Features unique to Gemini pass through the extra_body parameter:

# Enable context caching
response = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[...],
    extra_body={
        "cached_content": "projects/my-project/cachedContents/my-cache-id"
    }
)

# Configure thinking level for reasoning models
response = client.chat.completions.create(
    model="gemini-3-1-pro-preview",  # reasoning model
    messages=[...],
    extra_body={
        "thinking_config": {
            "thinking_level": "high"  # low, medium, or high
        }
    }
)

Model Name Mapping

OpenAI ModelGemini EquivalentContextNotes
gpt-5gemini-3-1-pro-preview1M tokensSimilar capability tier
gpt-5 (standard)gemini-3-flash-preview1M tokensMuch cheaper, still strong
gpt-4.1gemini-2.5-flash1M tokensGood balance of cost and quality
gpt-4o-minigemini-2.5-flash-lite1M tokensBudget option
text-embedding-3-largegemini-embedding-2-previewMultimodalText + image + video embedding

Model IDs for the compat layer use hyphens: gemini-3-flash-preview, gemini-3-1-pro-preview, gemini-2.5-flash, gemini-2.5-flash-lite. You can list all available models with client.models.list().


Pricing Impact

The cost difference is the clearest reason to migrate. Both APIs charge per million tokens for input and output.

AI models compared on a developer's screen Pricing differences across model tiers add up fast at production scale. Source: unsplash.com

ModelInput (per 1M tokens)Output (per 1M tokens)
GPT-5 (OpenAI)$1.25$10.00
GPT-5.4 (OpenAI)$2.50$15.00
GPT-4.1 (OpenAI)$2.00$8.00
GPT-4o (OpenAI)$2.50$10.00
Gemini 3.1 Pro Preview$2.00$12.00
Gemini 3 Flash Preview$0.50$3.00
Gemini 2.5 Flash$0.30$2.50
Gemini 2.5 Flash-Lite$0.10$0.40

Sample workload: 10M tokens/month (5M input, 5M output)

StackMonthly Cost
GPT-5$56.25
GPT-4.1$50.00
Gemini 3.1 Pro Preview$70.00
Gemini 3 Flash$17.50
Gemini 2.5 Flash$14.00
Gemini 2.5 Flash-Lite$2.50

A few things to note here. Gemini 3.1 Pro is actually slightly more expensive than GPT-5 for this workload - the cost advantage comes from the Flash tier. Swapping GPT-5 for Gemini 3 Flash is the most common migration path for cost reduction: similar quality for most tasks, 69% cheaper.

Gemini also offers a Batch API with a 50% discount on all paid models for async workloads. Gemini 2.5 Flash batch pricing drops to $0.15 input / $1.25 output per million tokens. The free tier gives you Gemini 2.5 Flash at 10 RPM and 250 daily requests with no payment required.

For more context on LLM pricing across providers, see our LLM API pricing comparison.


Known Gotchas

  1. Tool calling + structured output = 400 error. Gemini doesn't let you use tools and response_format in the same request. The error message is for forced function calling, you must set 'tool_config.function_calling_config.mode' to ANY instead of populating 'response_mime_type' and 'response_schema' fields. Work around this by describing your JSON schema requirements in the system prompt when you need both.

  2. Batch API file uploads require the native SDK. The genai Python SDK handles file uploads and downloads for batch jobs. You can't do it through the OpenAI-compat interface - client.files.upload() will fail. Use the native SDK for batch prep, then optionally process results through the compat layer.

  3. reasoning_effort and thinking_config can't be used at the same time. OpenAI's reasoning_effort: "high" maps to Gemini's thinking_level: "high", but pass only one. Setting both causes an error.

  4. Schema validation is stricter. Gemini rejects JSON tool schemas with unrecognized fields (e.g., Unknown name 'type' at 'tools[0].function'). If you've been sloppy with extra metadata in your tool definitions, clean them up before migrating.

  5. Free tier quotas dropped in December 2025. If you read older tutorials, the free limits they describe are no longer accurate. Current free tier for Gemini 2.5 Flash is 10 RPM and 250 requests per day - down from earlier limits. The official rate limits page in AI Studio shows your actual current quota.

  6. Vertex AI SDK for Gemini is deprecated after June 2026. If you're using Gemini via the Vertex AI SDK (not the Gemini Developer API), migration to the Gen AI SDK is required before June 2026. New Gemini features won't ship to the Vertex AI SDK path.

  7. Video generation uses async polling. Veo 3.1 video generation is asynchronous - you submit a job, then poll GET /v1/videos/{id} until completion. OpenAI's Sora also requires polling, so this won't surprise former Sora users, but it does mean you can't treat it like a synchronous completion call.

  8. Gemini 3 Pro Preview was deprecated March 9, 2026. If you see tutorials referencing gemini-3-pro-preview (without the 3.1), that model is gone. Use gemini-3-1-pro-preview instead.


FAQ

Can I use the same OpenAI Python or JS library?

Yes - set base_url to https://generativelanguage.googleapis.com/v1beta/openai/ and use your Gemini API key. No new SDK required for basic completions.

Will my existing prompts work without changes?

Usually yes for conversational prompts. System instructions, few-shot examples, and user messages all transfer cleanly. Prompts fine-tuned for GPT-5's reasoning style may need adjustment for Flash models.

Is there a free tier?

Yes. Gemini 2.5 Flash is free at 10 requests per minute and 250 requests per day. Gemini 2.5 Flash-Lite is free at 15 RPM and 1,000 requests per day. OpenAI has no equivalent free tier.

Does Gemini support the OpenAI Assistants API format?

No. The Assistants API is a separate product OpenAI is deprecating anyway (August 2026). If you're migrating from Assistants, consider rebuilding with Gemini's native generateContent endpoint or an agentic framework like LangChain or LlamaIndex.

How do I get a Gemini API key?

Go to Google AI Studio, sign in with a Google account, and click "Get API key". Free-tier keys are available without billing information.

Is Gemini's OpenAI compatibility still beta?

As of April 2026, yes - Google labels it beta. Core features work reliably, but some edge cases (complex tool schemas, batch file management) still require the native SDK.


Sources

✓ Last verified April 5, 2026

Migrating from OpenAI API to Google Gemini API
About the author AI Education & Guides Writer

Priya is an AI educator and technical writer whose mission is making artificial intelligence approachable for everyone - not just engineers.