Best LLMs Under $1 per Million Tokens in 2026

In March 2023, GPT-4 cost $60 per million output tokens. That same capability now runs at roughly $5. The price collapse happened faster than most teams planned for, and the $1/M threshold - once a marker for "basic summarization only" - has become truly competitive territory for production workloads.

TL;DR

Gemini 2.0 Flash at $0.075-$0.10/M input is the cheapest mainstream multimodal model; Gemini 2.5 Flash-Lite at $0.10/M is the newer version
DeepSeek V4 Flash ($0.14/M input) delivers frontier-class coding benchmarks - SWE-bench 79% - at a price that would have bought nothing usable two years ago
GPT-4.1 Nano ($0.10/M input, 1M context) is OpenAI's budget entry: solid instruction-following with a context window no competitor at this price matched until recently
Claude Haiku 4.5 ($0.80/M input) is Anthropic's option at this price band - note output costs ($4.00/M) push the all-in cost higher than input-only pricing suggests
Amazon Nova Micro ($0.035/M input) is the cheapest capable text model available, useful for high-volume classification and summarization

Three things changed to make this price band competitive. First, model distillation got better - the smaller models running in the sub-$1 bracket carry more capability from frontier training runs than their predecessors did. Second, caching discounts now effectively cut input costs by 50-90% for repeat prompts. Third, the open-weight competition (Qwen3, Mistral Small, DeepSeek) forced cloud providers to match prices they wouldn't have touched 18 months ago.

This comparison uses input cost as the primary filter - all models listed charge under $1 per million input tokens. Output costs vary and are noted where they affect the economics for specific use cases. For full LLM API pricing across all tiers, see the LLM API pricing comparison.

The $1 Quality Landscape

It's worth being clear about what this bracket actually delivers. The models in this comparison benchmark 15-40 points below frontier models (Claude Opus 4.6, GPT-5.4) on hard reasoning tasks. On coding benchmarks, the best budget models - DeepSeek V4 Flash, Qwen3 8B - narrow that gap to single digits. On classification, summarization, RAG retrieval, and structured output tasks, the gap often closes completely.

At 1 million queries per day with 500 input tokens each, Gemini 2.0 Flash costs $37.50/day. Claude Sonnet 4.6 at the same volume costs $1,500/day. That's a 40x cost difference for tasks where the quality gap is negligible.

The economic case for under-$1 models is strongest for applications that need volume: real-time content classification, first-pass summarization, RAG re-ranking, chatbot responses where the quality bar is "helpful and coherent" rather than "brilliant." For tasks requiring complex multi-step reasoning, agentic planning, or flagship-quality code generation, the frontier tier is worth the cost.

Gemini 2.0 Flash / 2.5 Flash-Lite - Cheapest Mainstream Option

Google's Flash family is the starting point for most cost-constrained production deployments. Gemini 2.0 Flash runs at $0.075/M input and $0.30/M output. Gemini 2.5 Flash-Lite, the newer version from early 2026, is priced at $0.10/M input and $0.40/M output with higher capability scores.

Both are multimodal - they accept images, audio, and video with text without an additional API call. At this price point, no other provider offers native multimodal input. That changes the calculus for applications that process screenshots, documents with images, or audio transcripts.

Google's free tier (1,500 requests/day, 15 requests/minute for Gemini 2.0 Flash) covers meaningful evaluation and low-volume production workloads. The rate limits are tight for burst traffic but sufficient for steady-state workloads under roughly 30 RPM.

The limitation is reasoning depth. Gemini Flash models handle instruction-following, summarization, and classification well. They degrade on multi-hop reasoning chains, complex code generation, and tasks requiring sustained coherence over very long contexts. For those cases, the upgrade path within Google's stack is Gemini 2.5 Pro at $1.25/M input - still competitive with OpenAI and Anthropic at that tier.

Pricing: Gemini 2.0 Flash at $0.075/M input, $0.30/M output. Gemini 2.5 Flash-Lite at $0.10/M input, $0.40/M output. Both on 1M token context windows.

Colorful gradient and data visualization on a computer screen at a desk Budget LLM APIs in 2026 cover a meaningful quality range - from sub-$0.05 models suited to classification only, to sub-$1 models with coding performance within 5 points of frontier. Source: unsplash.com

DeepSeek V4 Flash - Best Reasoning per Dollar

DeepSeek V4 Flash is a 284B-parameter Mixture-of-Experts model with 13B parameters active per forward pass. It costs $0.14/M input and $0.28/M output through DeepSeek's API, with cache-hit pricing at $0.014/M - a 90% discount on repeat prompt segments.

The benchmark numbers are the story. On SWE-bench Verified (real GitHub issue resolution), V4 Flash scores 79.0% against V4 Pro's 80.6%. The gap is 1.6 percentage points at one-tenth the price. On LiveCodeBench Pass@1, Flash scores 91.6 versus Pro's 93.5 - again close. For straightforward coding tasks, the Flash tier is functionally equivalent to the flagship.

The gap widens on complex agentic tasks. Terminal-Bench 2.0 shows Flash at 56.9% versus Pro at 67.9% - an 11-point difference that matters for multi-step tool-calling agents. For applications that chain 10+ tool calls with stateful decision-making, Pro justifies the cost. For simpler agent loops, Flash covers it.

The context window is 1M tokens, matching Google and OpenAI at this tier. Cache hits at $0.014/M input make long-context workloads (RAG with large document sets, long conversations) economically viable in a way that was impractical 18 months ago.

Pricing: $0.14/M input, $0.28/M output. Cache hit: $0.014/M. Context: 1M tokens.

GPT-4.1 Nano - OpenAI's Budget Entry

OpenAI's GPT-4.1 Nano is priced at $0.10/M input and $0.40/M output with a 1M token context window. It's OpenAI's answer to the pressure from Google and DeepSeek in the budget tier.

The capability profile is general-purpose instruction following: structured output generation, classification, translation, and summarization. It doesn't compete with DeepSeek V4 Flash on coding benchmarks, but it's competitive on instruction adherence and handles structured JSON output reliably without chain-of-thought scaffolding.

The 1M token context window at $0.10/M input is the strongest argument for Nano over alternatives. Processing a full codebase, a large PDF, or a long conversation history costs a fraction of what it did two years ago. The OpenAI platform infrastructure (reliability, uptime SLAs, enterprise contracts) adds operational value that pure cost comparisons don't capture.

Batch API discounts cut costs by 50%, bringing Nano to $0.05/M input for non-time-sensitive workloads. For classification pipelines that run overnight or weekly, the batch pricing makes Nano the cheapest capable OpenAI model available.

Pricing: $0.10/M input, $0.40/M output. Batch: $0.05/M input, $0.20/M output. Context: 1M tokens.

Mistral Small - European Data Residency Option

Mistral Small runs at $0.10/M input and $0.30/M output. The pricing matches GPT-4.1 Nano but with output slightly cheaper. The practical differentiator is data residency: Mistral processes requests in European infrastructure by default, which matters for GDPR-sensitive applications and organizations under EU data sovereignty requirements.

At this price point, Mistral's quality on general tasks is competitive with Google and OpenAI's budget tiers. The model is stronger than Nano on multilingual tasks (French, German, Spanish, Italian are first-class) and comparable on English reasoning and coding.

Mistral Nemo ($0.02/M input, $0.03/M output) is the budget option below Small - effectively free for most use cases but with noticeably lower capability scores. For teams that need European processing at the lowest possible cost and can accept quality below the Small tier, Nemo is the option.

Mistral also offers a free API tier with rate limits, useful for prototyping before committing to production billing.

Pricing: Mistral Small at $0.10/M input, $0.30/M output. Mistral Nemo at $0.02/M input, $0.03/M output.

Illustrated bar chart showing cost comparison data across providers The price spread within the sub-$1 input tier spans 35x from Mistral Nemo to Claude Haiku 4.5, but the quality gap between the mid-budget options (Flash, Nano, Small) has narrowed to within noise for most production use cases. Source: unsplash.com

Qwen3 8B - Best Open-Weight at the Low End

Qwen3 8B is available via hosted APIs (Fireworks AI, Together AI, SiliconFlow) at roughly $0.05/M input and $0.20/M output. Unlike the cloud models above, you can also run it locally - the model weights are open, Apache 2.0 licensed, and deployable on a single consumer GPU with 16GB VRAM.

The benchmark profile reflects the 8B parameter count: capable at instruction following, basic reasoning, and code completion; weaker on multi-step planning and complex mathematical reasoning. It beats same-size models from 2024 on most benchmarks due to improvements in training data and the Qwen family's consistent focus on instruction tuning.

The open-weight license is the operative advantage. Teams building products where model access needs to be self-contained - regulated industries, air-gapped deployments, applications that can't route data to cloud APIs - can run Qwen3 8B locally at zero API cost. The tradeoff is operational overhead: infrastructure, GPU cost, and engineering time to keep the stack running.

For teams using hosted APIs, Qwen3 8B is a reasonable choice for batch classification and RAG re-ranking at volume where even $0.05/M adds up. For teams needing local deployment, it's the current quality leader at the 8B parameter tier.

Pricing (hosted): ~$0.05/M input, ~$0.20/M output via Fireworks/Together. Local: free beyond GPU cost.

Claude Haiku 4.5 - Anthropic's Budget Tier

Claude Haiku 4.5 is priced at $0.80/M input and $4.00/M output. That input price fits the sub-$1 filter, but the output cost is five times what Gemini Flash charges. For workloads with low output-to-input ratios (short answers, structured yes/no classification), Haiku is competitive. For workloads with long outputs (full document generation, long-form summaries), the output cost changes the economics clearly.

The quality argument for Haiku is Anthropic's instruction following and safety alignment. On tasks where you need precise adherence to complex system prompts, avoidance of problematic outputs, or consistent formatting across varied inputs, Anthropic's training shows. Claude models also handle nuanced refusals (knowing when to decline and how to do it gracefully) better than equivalent-tier models from other providers.

Haiku also supports 200K token context, which is shorter than the 1M windows Google and OpenAI offer at comparable prices. For applications where context length is the constraint, that's a real limitation.

Pricing: $0.80/M input, $4.00/M output. Context: 200K tokens.

Amazon Nova Micro - Cheapest for Simple Tasks

Amazon Nova Micro is the cheapest capable text model available at $0.035/M input and $0.14/M output, rolled out through AWS Bedrock. It's a text-only model (no vision) with a 128K context window and function calling support.

The use case is narrow: high-volume classification, simple summarization, and extraction tasks where quality doesn't need to match the mid-budget tier. Nova Micro sits below Gemini Flash on benchmark scores but below everything else on price. For teams already running AWS workloads where adding Bedrock is an easy deployment decision, it's worth evaluating for batch classification pipelines before paying Flash or Nano rates.

Pricing: $0.035/M input, $0.14/M output. Context: 128K tokens.

Comparison Table

Model	Provider	Input $/1M	Output $/1M	Context	Multimodal
Mistral Nemo	Mistral	$0.02	$0.03	131K	No
Amazon Nova Micro	AWS Bedrock	$0.035	$0.14	128K	No
Qwen3 8B (hosted)	Fireworks/Together	$0.05	$0.20	128K	No
Gemini 2.0 Flash	Google	$0.075	$0.30	1M	Yes
GPT-4.1 Nano	OpenAI	$0.10	$0.40	1M	No
Gemini 2.5 Flash-Lite	Google	$0.10	$0.40	1M	Yes
Mistral Small	Mistral	$0.10	$0.30	128K	No
GPT-4o Mini	OpenAI	$0.15	$0.60	128K	Yes
DeepSeek V4 Flash	DeepSeek	$0.14	$0.28	1M	No
DeepSeek V4 Flash (cached)	DeepSeek	$0.014	$0.28	1M	No
Claude Haiku 4.5	Anthropic	$0.80	$4.00	200K	Yes

Which Model Fits Which Use Case

High-volume classification and summarization: Gemini 2.0 Flash or Amazon Nova Micro. The quality bar for these tasks is "reliable and coherent," not "brilliant," and both models clear it at the lowest cost available.

Production coding tasks on a budget: DeepSeek V4 Flash. The SWE-bench 79% score at $0.14/M input is the best coding quality-to-cost ratio in the sub-$1 tier. Cache discounts make long-context code work economical.

Long-context document processing: GPT-4.1 Nano or DeepSeek V4 Flash - both offer 1M token windows. Nano's batch pricing ($0.05/M) is worth factoring in for offline processing pipelines.

European data residency: Mistral Small, no alternatives. Mistral processes in EU infrastructure by default; no other provider at this price tier makes that guarantee.

Local/self-hosted deployment: Qwen3 8B. Apache 2.0, runs on a 16GB GPU, open weights for air-gapped environments.

Anthropic ecosystem continuity: Claude Haiku 4.5. If your application is already using Claude for its instruction following and safety alignment, Haiku maintains that consistency at the lowest Anthropic price point - watch the output cost.

The sub-$1 bracket covers roughly 80% of production AI workloads when matched to task complexity. The remaining 20% - hard reasoning, complex multi-file codegen, frontier-quality analysis - still justify the $3-$15/M tier. The cost difference between getting this right and wrong at volume is major enough to warrant a dedicated evaluation before committing.