Fine-Tuning Costs Comparison - Train Your Own AI

Side-by-side fine-tuning costs for OpenAI, Google, Together AI, Fireworks, Mistral, and self-hosted GPU options with LoRA vs full training breakdowns.

Cheapest: Together AI (Llama 3.1 8B LoRA) Best Value: Together AI (70B LoRA) Updated weekly
Fine-Tuning Costs Comparison - Train Your Own AI

TL;DR

  • Together AI offers the cheapest API fine-tuning at $0.48/1M tokens for LoRA on models up to 16B parameters
  • OpenAI's GPT-4o training runs $25/1M tokens - but the new GPT-4.1 drops that to $3/1M, a 88% cut
  • Self-hosted LoRA on a single H100 ($2.65-$3.99/hr) beats API pricing for datasets above roughly 50M tokens
  • LoRA achieves 80-95% of full fine-tuning quality at 70-90% lower cost, making it the default starting point

The Bottom Line

If you want the cheapest path to a custom model, Together AI's LoRA fine-tuning at $0.48 per million tokens on Llama 3.1 8B is hard to beat. For teams committed to OpenAI's ecosystem, the GPT-4.1 family changed the economics: training at $3/1M tokens (GPT-4.1) or $0.80/1M tokens (GPT-4.1-mini) makes fine-tuning accessible where GPT-4o's $25/1M was prohibitive. Google's Vertex AI sits in the middle at $3/1M tokens for Gemini 2.0 Flash tuning, with the bonus that inference pricing stays identical to the base model.

The real question isn't which provider is cheapest per token. It's whether API fine-tuning or self-hosted training makes more sense for your workload. We break down both paths below.

API Fine-Tuning Pricing Table

All prices in USD per million tokens. Training cost covers tokens processed during fine-tuning (dataset size multiplied by epochs). Inference costs apply when you call your fine-tuned model afterward. Prices verified against official documentation on March 26, 2026.

ProviderModelTraining (/1M)Inference In (/1M)Inference Out (/1M)Min ExamplesMethod
Together AILlama 3.1 8B$0.48$0.18$0.181LoRA
Together AIMistral 7B$0.48$0.20$0.201LoRA
Together AILlama 3.1 8B$0.54$0.18$0.181Full
FireworksLlama 3.1 8B$0.50$0.20$0.201LoRA
OpenAIGPT-4.1 Nano$0.20$0.20$0.8010SFT
OpenAIGPT-4o-mini$0.30$0.30$1.2010SFT
OpenAIGPT-4.1 Mini$0.80$0.80$3.2010SFT
MistralMistral 7B$1.00$0.25$0.251SFT
GoogleGemini 2.0 Flash Lite$1.00$0.075$0.3010SFT
Together AILlama 3.1 70B$1.50$0.88$0.881LoRA
Together AILlama 3.1 70B$1.65$0.88$0.881Full
MistralMistral Small$2.00$0.20$0.601SFT
Together AI70-100B models$2.90variesvaries1LoRA
CohereCommand R$3.00$0.30$1.202SFT
OpenAIGPT-4.1$3.00$3.00$12.0010SFT
FireworksLlama 3.1 70B$3.00$0.90$0.901LoRA
GoogleGemini 2.0 Flash$3.00$0.15$0.6010SFT
OpenAIGPT-3.5 Turbo$8.00$3.00$6.0010SFT
OpenAIGPT-4o$25.00$3.75$15.0010SFT

For base model inference pricing without fine-tuning, see our LLM API pricing comparison.

What Stands Out

OpenAI's pricing spread is enormous. GPT-4.1 Nano training at $0.20/1M tokens is 125x cheaper than GPT-4o at $25/1M. If you were fine-tuning GPT-4o and haven't re-evaluated since GPT-4.1 launched, you're likely overpaying.

Together AI and Fireworks compete directly on open-source model training, with Together slightly cheaper and offering both LoRA and full fine-tuning options. Google's tuned model inference staying at base model prices is a meaningful advantage - OpenAI charges a premium for fine-tuned model inference (GPT-4.1 fine-tuned inference runs $3/1M input vs $2/1M for the base model).

API Fine-Tuning vs Self-Hosted Training

The API approach bundles infrastructure, tooling, and hosting into one per-token price. Self-hosted training means renting GPUs and running the training job yourself using frameworks like Hugging Face TRL, Axolotl, or LLaMA-Factory.

GPU server rack in a data center with blinking status lights Self-hosted fine-tuning requires renting cloud GPUs or maintaining on-premise hardware. Source: pexels.com

GPU Cloud Pricing for Training

ProviderGPUVRAMHourly RateBest For
Vast.aiH100 SXM80GB$1.49-$2.00Budget training, spot
RunPodH100 SXM80GB$2.65Reliable on-demand
LambdaH100 SXM80GB$3.99Managed environment
LambdaA100 SXM80GB$2.79Budget large models
RunPodA10080GB$1.64Budget LoRA training
Vast.aiA10080GB$1.00-$1.80Spot pricing
LambdaB200 SXM192GB$6.69Massive models

For a deeper look at GPU cloud costs, see our open-source hosting costs breakdown.

When Self-Hosted Wins

Self-hosted training becomes cheaper when your dataset is large enough that the fixed overhead of setting up infrastructure amortizes across many tokens. A rough crossover point: if you're processing more than 50M training tokens on a 7B model, renting a single H100 on Vast.ai at $1.49/hr for a few hours costs less than Together AI's $0.48/1M API price.

The math on a concrete example: fine-tuning Llama 3.1 70B with LoRA on 10M tokens (3 epochs = 30M processed tokens) costs $43.50 through Together AI's API. The same job on a rented 8xA100 cluster takes roughly 2-4 hours at $13-$22/hr total, running $26-$88. At the low end (Vast.ai spot pricing), self-hosted is cheaper. At the high end (Lambda on-demand), the API wins on convenience.

When APIs Win

API fine-tuning wins on three fronts. First, zero setup - no configuring CUDA, managing dependencies, or debugging distributed training. Second, built-in evaluation and monitoring dashboards. Third, your fine-tuned model is right away available for serving at the same endpoint, with no separate deployment step.

For teams running occasional fine-tuning jobs (monthly or quarterly retraining), the engineering time saved by using an API almost always outweighs the per-token premium.

LoRA vs Full Fine-Tuning Cost Breakdown

Understanding the cost gap between LoRA (Low-Rank Adaptation) and full fine-tuning matters because it's often the single biggest lever you can pull. For more technical background on these techniques, read our fine-tuning and distillation guide.

Financial analysis dashboard showing cost comparisons and data charts Breaking down the real costs across methods and model sizes uncovers where the savings actually live. Source: pexels.com

Cost by Model Size and Method

Model SizeLoRA (Together AI)Full (Together AI)SavingsQuality Retention
Up to 16B$0.48/1M$0.54/1M11%90-95%
17-69B$1.50/1M$1.65/1M9%85-95%
70-100B$2.90/1M$3.20/1M9%80-95%

Together AI's LoRA-to-full pricing gap is modest at 9-11%. The real savings with LoRA show up in self-hosted scenarios, where GPU memory requirements drop dramatically. A 7B model needs a single 24GB GPU for LoRA versus 4x 80GB GPUs for full fine-tuning. That hardware gap translates to 4-10x cost reduction.

DPO Training Costs

Direct Preference Optimization (DPO) - used for aligning models with human preferences - costs clearly more than standard supervised fine-tuning. Together AI's DPO pricing runs 2.5x higher than SFT:

  • Up to 16B: $1.20/1M (LoRA), $1.35/1M (Full)
  • 17-69B: $3.75/1M (LoRA), $4.12/1M (Full)
  • 70-100B: $7.25/1M (LoRA), $8.00/1M (Full)

If your fine-tuning goal is style matching or domain knowledge, start with SFT. Reserve DPO for cases where you need explicit preference alignment, such as safety tuning or output ranking.

Hidden Costs Most Guides Skip

Data Preparation

Budget 10-15% of your total fine-tuning spend on data preparation. Cleaning, formatting, deduplication, and quality filtering take real engineering hours. OpenAI requires a minimum of 10 training examples, but effective fine-tuning normally needs 500-10,000 high-quality examples. Producing that dataset - especially for specialized domains - often costs more than the training itself.

Failed Experiments

Plan for 3-5 training runs before landing on a configuration that works. That means your actual training cost is 3-5x the single-run estimate. Hyperparameter sweeps over learning rate, LoRA rank, and epoch count add up fast. Together AI and Fireworks charge per token processed, so every abandoned run still hits your bill.

Inference Cost Multipliers

Fine-tuned models on OpenAI cost more to run than base models. GPT-4.1 base inference is $2/1M input, but fine-tuned GPT-4.1 inference jumps to $3/1M input - a 50% premium. Google doesn't charge this premium: tuned Gemini 2.0 Flash inference stays at the same $0.15/$0.60 as the base model. This ongoing inference premium can dwarf the one-time training cost for high-volume production workloads.

Storage and Checkpoints

Self-hosted training generates checkpoint files at every evaluation step. A 70B model checkpoint occupies roughly 140GB uncompressed. Five checkpoints from a single run means 700GB of storage. Cloud storage at $0.023/GB/month (AWS S3 standard) adds $16/month per run's worth of checkpoints - small individually, but it builds up across experiments.

OpenAI Data Sharing Discount

OpenAI offers reduced inference pricing if you enable data sharing when creating the fine-tune job. This halves fine-tuned inference costs for both standard and batch tiers. The trade-off: your training data and completions may be used to improve OpenAI's models. For many teams, the privacy concern outweighs the savings.

Practical Cost Examples

Example 1: Customer Support Bot (Small Scale)

  • Goal: Fine-tune for domain-specific Q&A, 5,000 examples (~2M tokens), 3 epochs
  • API route: GPT-4.1 Mini on OpenAI = 6M tokens x $0.80/1M = $4.80 training
  • Inference: 500K tokens/month at $0.80/$3.20 = $2.00/month
  • Total first year: ~$29

Example 2: Code Assistant (Medium Scale)

  • Goal: Fine-tune Llama 3.1 70B for internal codebase, 50,000 examples (~25M tokens), 3 epochs
  • API route: Together AI LoRA = 75M tokens x $1.50/1M = $112.50 training
  • Self-hosted: 8xA100 on Vast.ai at ~$10/hr for ~6 hours = $60
  • Savings from self-hosted: 47%

Example 3: Enterprise Classification (Large Scale)

  • Goal: Fine-tune GPT-4o for document classification, 200,000 examples (~100M tokens), 4 epochs
  • API route: OpenAI = 400M tokens x $25/1M = $10,000 training
  • Alternative: GPT-4.1 at $3/1M = 400M x $3 = $1,200 (88% savings, same model family)
  • Open-source alternative: Llama 3.1 70B LoRA on Together AI = 400M x $1.50/1M = $600

The GPT-4.1 launch reshuffled the economics for OpenAI-locked teams. If you're still on GPT-4o fine-tuning, migrating to GPT-4.1 should be your first move.

Decision Framework

Choosing between fine-tuning approaches comes down to four variables. For general guidance on picking the right model in the first place, see our how to choose an LLM guide.

Use API fine-tuning when:

  • Your team doesn't have ML infrastructure expertise
  • Dataset is under 50M tokens
  • You need quick iteration cycles (hours, not days)
  • You want managed serving included in the price

Use self-hosted training when:

  • Dataset passes 100M tokens
  • You need full control over hyperparameters and training loop
  • Data privacy requirements rule out third-party APIs
  • You're running many experimental iterations and want to minimize per-run cost

Start with LoRA when:

  • You're fine-tuning for the first time
  • Budget is constrained
  • The task is style transfer, formatting, or domain adaptation
  • You want to swap adapters at inference time without redeploying

Use full fine-tuning when:

  • LoRA quality doesn't meet your threshold after testing
  • The task requires deep behavioral changes across the full weight space
  • You've validated the approach with LoRA first and need that final 5-10% quality

For a deeper explanation of what fine-tuning is and when it makes sense over prompt engineering, we have a dedicated guide.

FAQ

Which fine-tuning provider is cheapest?

Together AI at $0.48/1M tokens for LoRA on models up to 16B parameters. OpenAI's cheapest option is GPT-4.1 Nano at $0.20/1M, but inference costs are higher.

Is LoRA good enough for production?

LoRA retains 80-95% of full fine-tuning quality depending on the task. For style matching, format compliance, and domain adaptation, LoRA performs nearly identically to full training.

How much data do I need for fine-tuning?

OpenAI requires a minimum of 10 examples. Practically, 500-5,000 high-quality examples produce meaningful improvements. More data helps, but quality matters more than quantity.

Does fine-tuning cost more than prompt engineering?

Training is a one-time cost. If fine-tuning reduces your prompt by 500 tokens per request and you process 1M requests/month, the token savings pay for training within weeks.

Can I fine-tune open-source models for free?

Training is never truly free - you still need compute. But using QLoRA on a consumer GPU (RTX 4090, ~$0.40/hr on Vast.ai) can fine-tune a 7B model for under $5.

How long does API fine-tuning take?

Most jobs under 10M tokens complete within 1-3 hours. Larger jobs (100M+ tokens) can take 6-24 hours depending on the provider and model size.


Sources:

✓ Last verified March 26, 2026

Fine-Tuning Costs Comparison - Train Your Own AI
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.