Migrating from OpenAI API to Anthropic API

TL;DR

Yes, you can switch - Anthropic offers an OpenAI SDK compatibility layer for quick testing
System message handling, image inputs, tool calling schemas, and temperature handling all change
You gain prompt caching (up to 90% savings), adaptive thinking on Opus 4.7, and PDF support
Medium difficulty, 2-4 hours typical - but Opus 4.7 adds breaking API changes that require code updates

Why Switch from OpenAI to Anthropic?

Developers are increasingly moving workloads to the Anthropic API for a few concrete reasons. Claude Opus 4.7 - released April 16, 2026 - is Anthropic's current flagship, with a standout jump in agentic coding performance and high-resolution vision (images up to 3.75 megapixels, more than triple prior models). Its adaptive thinking mode lets Claude decide when and how much extended reasoning to apply, without you manually specifying token budgets. Prompt caching can cut costs by up to 90% on repeated context, which matters at scale. And features like native PDF processing and citations aren't available through OpenAI's API at all.

Still, this isn't a drop-in swap. The request format is different, system messages work differently, and tool calling uses its own schema. If you've fine-tuned prompts specifically for GPT-4o or GPT-5, expect to spend time adjusting them for Claude.

Anthropic does offer an OpenAI SDK compatibility layer that lets you point the OpenAI Python or TypeScript SDK at Anthropic's endpoint. It's useful for quick comparisons, but it strips out Claude-specific features like prompt caching, extended thinking, and structured outputs. For production, you'll want the native API.

Feature Parity Table

Feature	OpenAI	Anthropic	Notes
Chat completions	`POST /v1/chat/completions`	`POST /v1/messages`	Different request/response format
Streaming	`stream: true` (SSE)	`stream: true` (SSE)	Direct equivalent, different event names
System prompt	`role: "system"` in messages array	Top-level `system` parameter	Anthropic uses a single system block
Function/tool calling	`tools[]` with `functions`	`tools[]` with `input_schema`	Different schema structure
Structured output	`response_format` with JSON schema	`output_config` or tool `strict: true`	Both support strict validation
Image input	URL or base64 in content array	Base64 or URL in content array	Both now support URLs
PDF input	Not supported	Native `DocumentBlockParam`	Anthropic-only feature
Prompt caching	Not available	`cache_control` parameter	Up to 90% input cost savings
Adaptive thinking	Not available	`thinking: {type: "adaptive"}`	Opus 4.7: only supported mode. Also works on Sonnet 4.6, Opus 4.6
Extended thinking (legacy)	Not available	`thinking: {type: "enabled", budget_tokens: N}`	Deprecated on Sonnet/Opus 4.6; removed on Opus 4.7 (400 error)
Temperature / sampling	0-2 range	0-1 range (Sonnet 4.6, Haiku 4.5); rejected on Opus 4.7	Breaking change - omit entirely when targeting Opus 4.7
Embeddings	`POST /v1/embeddings`	Not available	Use a separate provider
Audio input	Supported	Not supported	OpenAI-only feature
Batch processing	Batch API (50% discount)	Batch API (50% discount)	Both offer async batching

Developer writing Python code on a laptop, terminal window visible Switching APIs means updating more than just the endpoint URL - message structure, tool schemas, and thinking parameters all differ. Source: unsplash.com

API Mapping

The core difference is structural. OpenAI wraps everything in a chat/completions envelope with a choices array. Anthropic returns a message object with a content array of typed blocks.

Basic Chat Completion

OpenAI:

from openai import OpenAI

client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in two sentences."}
    ],
    max_tokens=256,
    temperature=0.7
)

print(response.choices[0].message.content)

Anthropic:

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")

response = client.messages.create(
    model="claude-sonnet-4-6",
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Explain quantum computing in two sentences."}
    ],
    max_tokens=256,
    temperature=0.7
)

print(response.content[0].text)

Three things changed. The system prompt moved from the messages array to a top-level system parameter. The response comes back as response.content[0].text instead of response.choices[0].message.content. And the model name uses Anthropic's naming scheme.

Streaming

OpenAI:

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about APIs."}],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="")

Anthropic:

with client.messages.stream(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a haiku about APIs."}],
    max_tokens=100
) as stream:
    for text in stream.text_stream:
        print(text, end="")

Anthropic's Python SDK provides a .stream() context manager that yields text directly, which is actually cleaner than parsing delta objects. Under the hood, both use SSE, but the event types differ - Anthropic sends content_block_delta events while OpenAI sends chat.completion.chunk events.

Tool Calling

This is where the biggest differences show up.

OpenAI:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }]
)

tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name, tool_call.function.arguments)

Anthropic:

response = client.messages.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    max_tokens=256,
    tools=[{
        "name": "get_weather",
        "description": "Get current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
        }
    }]
)

tool_block = next(b for b in response.content if b.type == "tool_use")
print(tool_block.name, tool_block.input)

Two structural differences matter here. OpenAI nests tools under type: "function" with a function key containing parameters. Anthropic puts name, description, and input_schema at the top level of each tool. Second, OpenAI returns tool calls as a separate tool_calls array on the message, while Anthropic includes them as tool_use content blocks in the same content array as text.

Image Input

OpenAI:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
        ]
    }]
)

Anthropic:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image", "source": {"type": "url", "url": "https://example.com/photo.jpg"}}
        ]
    }]
)

Both APIs now support image URLs directly. Anthropic also supports base64 encoding with "type": "base64" and a media_type field. One constraint to watch: Anthropic rejects images larger than 8000x8000 pixels, and if you send more than 20 images per request, each must be under 2000x2000.

Pricing Impact

Pricing depends heavily on which models you're comparing. Both providers have updated their lineups since early 2026 - OpenAI's current flagship family is the GPT-5.4 series. Here are the most common matchups as of April 2026:

Use Case	OpenAI Model	Price (in/out per 1M tokens)	Claude Model	Price (in/out per 1M tokens)
Flagship	GPT-5.4	$2.50 / $15.00	Claude Opus 4.7	$5.00 / $25.00
Mid-tier	GPT-5.4 mini	$0.75 / $4.50	Claude Sonnet 4.6	$3.00 / $15.00
Fast/cheap	GPT-5.4 nano	$0.20 / $1.25	Claude Haiku 4.5	$1.00 / $5.00

For a workload processing 10 million input tokens and 2 million output tokens per month at the mid-tier:

GPT-5.4 mini: (10 x $0.75) + (2 x $4.50) = $16.50/month
Claude Sonnet 4.6: (10 x $3.00) + (2 x $15.00) = $60.00/month

Claude Sonnet costs roughly 3.6x more in this base-rate scenario. Anthropic's prompt caching narrows that gap for workloads with repeated context - cache hits cost 10% of the standard input price. A workload where 70% of input tokens are cached brings Claude Sonnet's effective cost down to around $41/month, while GPT-5.4 mini with caching drops to roughly $12/month. OpenAI is still cheaper, but the gap is smaller, and the trade-off may be worth it for specific features like adaptive thinking, PDF support, or longer output limits.

Both providers offer 50% discounts through their batch APIs. Claude Sonnet 4.6 and Opus 4.7 also support up to 300k output tokens per request through the Batch API with the output-300k-2026-03-24 beta header - useful for very long generation tasks.

Claude Opus 4.7 benchmark comparison chart showing performance improvements over prior models Claude Opus 4.7 benchmark results from Anthropic's April 2026 release announcement, showing gains in coding, vision, and reasoning tasks. Source: anthropic.com

Known Gotchas

System messages get concatenated. If your OpenAI code places system messages at multiple points in the conversation, Anthropic will hoist and concatenate them all into one system block at the start. This can break prompts that rely on mid-conversation system instructions.
max_tokens is required. OpenAI defaults to a model-specific maximum if you omit max_tokens. Anthropic requires it explicitly - you'll get an error if it's missing.
No response_format for JSON mode. OpenAI's response_format: { type: "json_object" } doesn't have a direct equivalent. Use Anthropic's structured outputs (via output_config with a JSON schema) or tool use with strict: true instead.
Temperature range differs. OpenAI accepts temperatures from 0 to 2. Anthropic caps at 1.0 and silently clamps anything higher. If you were using temperature 1.5 for creative tasks, your outputs will be less varied on Claude.
No embeddings endpoint. Anthropic doesn't offer an embeddings API. If your pipeline creates embeddings, you'll need to keep OpenAI for that or switch to an alternative like Voyage AI or a self-hosted solution.
n parameter must be 1. OpenAI lets you request multiple completions per call (e.g., n=3). Anthropic always returns exactly one response. You'll need to make separate API calls if you want multiple generations.
Token counting uses different tokenizers. The same text will produce different token counts between OpenAI (tiktoken) and Anthropic. Don't assume your existing token estimates will transfer directly.
Role alternation is strict. Anthropic requires messages to alternate between user and assistant roles. OpenAI is more lenient about consecutive messages from the same role. You may need to merge adjacent user messages before sending.
Opus 4.7 rejects temperature, top_p, and top_k. Starting with claude-opus-4-7, passing any non-default value for these sampling parameters returns a 400 error. This is a hard break - code that sets temperature=0.7 will fail right away. Remove these parameters from your requests and use prompting to guide output style instead. claude-sonnet-4-6 and claude-haiku-4-5 still accept temperature values up to 1.0.
budget_tokens is removed on Opus 4.7. The extended thinking syntax thinking: {type: "enabled", budget_tokens: N} no longer works on claude-opus-4-7 - it returns a 400 error. Switch to thinking: {type: "adaptive"} and use the effort parameter (low, medium, high, xhigh, max) to guide how much reasoning Claude does. If you need predictable cost control, pair adaptive thinking with a firm max_tokens cap.
Opus 4.7 uses a new tokenizer. The same text can consume 1.0x to 1.35x as many tokens on Opus 4.7 compared to earlier models, depending on content. If you have tight max_tokens budgets or per-request cost caps, test with your actual content before a production migration - you may hit stop_reason: "max_tokens" more often than expected.
thinking.display defaults to "omitted" on Opus 4.7. On Opus 4.6, thinking blocks default to "summarized" - you see the reasoning content in the response. On Opus 4.7, the default changed to "omitted" - thinking blocks appear in the stream but the thinking field is empty. To restore visible thinking text, explicitly set thinking: {type: "adaptive", display: "summarized"}.

FAQ

Can I use the OpenAI SDK with Anthropic's API?

Yes. Set base_url to https://api.anthropic.com/v1/ and use your Anthropic API key. But this compatibility layer doesn't support prompt caching, structured outputs, or extended thinking.

Will my OpenAI prompts work without changes?

Basic prompts usually transfer well, but heavily optimized prompts will need tuning. Anthropic recommends their Console's prompt improver as a starting point for adaptation.

Is there a compatibility layer for production use?

The OpenAI SDK compatibility layer is for testing and evaluation only. Anthropic recommends their native SDK for production workloads to access all features.

How do I handle the missing embeddings endpoint?

Use a dedicated embeddings provider like Voyage AI, Cohere, or run an open-source model. See our embeddings guide for options.

Does Anthropic support fine-tuning?

Not through their public API as of April 2026. If you rely on OpenAI fine-tuned models, you'll need to reproduce that behavior through prompt engineering or explore Anthropic's enterprise offerings.

Does switching to Opus 4.7 require code changes beyond swapping the model ID?

Yes. Opus 4.7 removes temperature, top_p, and top_k (pass them and get a 400 error), and drops extended thinking with budget_tokens. Switch to thinking: {type: "adaptive"} and remove sampling params before changing the model ID.

What about rate limits?

Both providers use token-based and request-based rate limits that scale with your usage tier. Check Anthropic's rate limits documentation for current thresholds.

Sources: