Cloud GPU Rental Pricing Compared - April 2026

TL;DR

Cheapest H100 80GB: Vast.ai spot at ~$1.00/hr, RunPod spot at ~$1.25/hr
Best price-plus-reliability: Lambda Labs ($3.29/hr, zero friction) or RunPod Community Cloud ($1.99/hr)
Cheapest A100 80GB: Vast.ai spot from ~$0.29/hr
Cheapest H200 141GB: Nebius committed at $2.30/hr
Cheapest RTX 4090: Salad at $0.20/hr (consumer distributed network)
Hyperscalers charge 50-400% more than GPU-first clouds for equivalent hardware
Market average H100 on-demand has settled near $3.00-3.20/hr - down 40%+ from early 2025 highs

This article is specifically about raw GPU rental - you get a machine with a GPU, an OS, and a network connection, and you pay by the hour. No managed training APIs, no per-token pricing. If you are looking for those, see our fine-tuning costs comparison and open-source hosting costs breakdown.

The market has changed fast. AWS cut H100 pricing 44% in June 2025. Supply from new data centers and B200/GB200 build-outs has pushed prices down across the board. If you are still operating on year-old rate cards, you are almost certainly overpaying.

Master Pricing Table - H100 80GB SXM

The H100 80GB SXM5 remains the workhorse GPU for serious AI training and inference. All prices are per single GPU per hour, in USD, verified against official provider pricing pages or quotes as of April 19, 2026.

Provider	On-Demand/hr	Spot/hr	Reserved/hr	Category
Vast.ai	$1.38-1.87	~$1.00	50% off (3-6mo)	Marketplace
RunPod Community	$1.99	~$1.25	-	GPU-first
FluidStack	$2.10	-	Contact sales	GPU-first
Jarvislabs	$2.69	-	Custom	GPU-first
Nebius	$2.95	-	$2.00 (committed)	GPU-first
Vultr	$2.99	-	$2.30 (36mo)	GPU-first
RunPod Secure	$2.39-2.69	Available	-	GPU-first
Lambda Labs	$3.29-4.29	-	Contact sales	GPU-first
CoreWeave	~$6.16	-	~$1.45 (reserved)	Enterprise
Together AI	$3.49-3.99	-	$2.25-2.69	GPU-first
GCP (A3 High)	~$3.00	~$2.25	SUD discount	Hyperscaler
AWS (p5.48xlarge)	~$3.90	~$2.50	$1.90-2.10 (1-3yr)	Hyperscaler
Crusoe	$3.90	Contact sales	Contact sales	GPU-first
Modal	~$3.95	-	-	Serverless
Azure (ND H100 v5)	~$6.98	~$3.50	1yr / 3yr reserved	Hyperscaler
Oracle Cloud	~$10.00	-	Contact sales	Hyperscaler

Reading the table: "Contact sales" means pricing is not publicly listed - it typically signals enterprise commitments of 6-24 months. Reserved pricing rows require upfront payment or multi-month contracts.

Full GPU SKU Comparison Table

Normalized to per GPU per hour. Where a provider does not offer a GPU, the cell is blank.

Provider	H100 80GB (OD)	H100 spot	H200 141GB	A100 80GB	L40S 48GB	RTX 4090	RTX 6000 Ada
Vast.ai	$1.38-1.87	~$1.00	-	$0.29-1.50	~$0.60-0.90	$0.35-0.55	~$0.50-0.70
RunPod	$1.99-2.69	~$1.25	$4.31	~$1.64	~$0.79	~$0.34	~$0.60
Lambda Labs	$3.29-4.29	-	-	$2.79	-	-	-
Nebius	$2.95	-	$3.50 (OD) / $2.30 (commit)	-	-	-	-
Vultr	$2.99	-	-	$2.80	-	-	-
Crusoe	$3.90	~$1.20	$4.29	$1.65-1.95	-	-	-
CoreWeave	~$6.16	-	~$6.31	~$2.70	Contact sales	-	-
Together AI	$3.49-3.99	-	$4.19 (OD) / $2.59 (rsv)	-	-	-	-
FluidStack	$2.10	-	-	Contact sales	Contact sales	-	-
Jarvislabs	$2.69	-	$3.80	$1.49	-	-	-
Hyperstack	Contact sales	-	Contact sales	Contact sales	Contact sales	-	-
TensorDock	~$2.50	~$1.60	-	~$1.30	~$0.50	~$0.25	-
Salad	-	-	-	-	-	$0.20	-
Paperspace	$5.95	-	-	-	-	-	-
AWS	~$3.90	~$2.50	Contact sales	~$3.43	-	-	-
GCP	~$3.00	~$2.25	Contact sales	~$5.78	-	-	-
Azure	~$6.98	~$3.50	Contact sales	~$3.67	-	-	-
Oracle	~$10.00	-	Contact sales	Contact sales	-	-	-
Akash Network	~$1.50-2.00	Bid-based	-	~$0.50-1.00	~$0.40-0.60	~$0.10-0.20	-

Note on RTX 5090: As of April 2026, the RTX 5090 is not yet widely available on cloud rental platforms. Salad lists it at approximately $0.50/hr on their consumer distributed network, and a few Vast.ai hosts have listed at $0.80-1.20/hr, but availability is extremely limited and inconsistent. I would not plan production workloads around it yet.

Provider-by-Provider Breakdown

GPU-First Clouds

These providers built their businesses around GPU rental. They typically offer better price-to-performance, simpler interfaces, and faster provisioning than hyperscalers.

RunPod

RunPod offers the best combination of price and reliability for most teams. Two tiers: Community Cloud (cheaper, peer-hosted) and Secure Cloud (data center grade). H100 SXM in Community Cloud runs $1.99/hr on-demand, $1.25/hr spot. Per-minute billing. Templates for PyTorch, vLLM, and Axolotl let you go from nothing to a running training loop in under five minutes. No minimum commitment.

Egress: ~$0.05/GB. Storage: $0.10/GB/month. User community is large and the Discord is active. Trustpilot reviews consistently praise support response times of under 15 minutes.

Minimum commitment: None. Credit card, no contract.

Lambda Labs

My go-to recommendation for researchers who want zero friction. Pre-installed PyTorch environment, 1-Click Clusters with InfiniBand networking for multi-GPU runs, SSH access out of the box. H100 SXM5 at $3.29-4.29/hr depending on configuration. No spot pricing - Lambda does not offer preemptible instances, which is actually a feature for training jobs where you cannot afford interruptions.

The downside: H100 stock-outs are common. Lambda's popularity means you may have to wait hours for a node. For ad hoc interactive work, this is frustrating.

Egress: Generous allowances included. Storage: Persistent volumes at $0.20/GB/month.

Minimum commitment: None on standard instances. 1-3 year reserved pricing available via sales.

Vast.ai

The cheapest option on the market, period. It is a P2P marketplace where individual hosts list their hardware. H100 on-demand ranges $1.38-1.87/hr, spot as low as ~$1.00/hr. The trade-off is reliability: hosts can reclaim their machines, instances can disappear mid-job. Per-second billing.

Use Vast.ai for experiments, batch jobs with checkpointing enabled, and anything you can restart without losing work. Do not use it for production inference.

Filter by "verification score" on the marketplace to find higher-reliability hosts. Hosts with 95%+ scores and long uptime histories are substantially more trustworthy.

Minimum commitment: None. Per-second billing.

Nebius

Nebius (formerly Yandex Cloud's international arm) has emerged as a competitive option, especially for teams wanting committed pricing without enterprise paperwork. H100 at $2.95/hr on-demand, $2.00/hr on a committed plan. H200 at $3.50/hr on-demand, $2.30/hr committed - that H200 committed rate is the lowest I have found publicly.

Strong NVIDIA partnership, InfiniBand support, European data center locations for GDPR compliance.

Minimum commitment: Committed pricing requires a multi-month agreement via their portal.

CoreWeave

CoreWeave is the enterprise GPU cloud. Kubernetes-native, InfiniBand interconnects, 256+ GPU cluster support. H100 on-demand at ~$6.16/hr looks expensive until you see the reserved rate: ~$1.45/hr for committed capacity. That reserved price beats nearly everyone for stable, long-running workloads.

The catch: CoreWeave requires enterprise onboarding (multi-day process), typically a minimum 3-6 month commitment for reserved capacity, and expects customers who know how to operate Kubernetes clusters. Not a place to spin up your first training experiment.

Minimum commitment: Enterprise sales required for reserved. On-demand available but targeted at existing customers.

Crusoe

Crusoe runs on 100% clean energy (stranded natural gas, increasingly renewable). H100 at $3.90/hr on-demand, competitive A100 spot pricing at ~$1.20-1.30/hr. The environmental angle matters for companies with sustainability reporting requirements. No egress charges is a meaningful differentiator for large dataset workloads.

Minimum commitment: None on on-demand. Reserved pricing via sales.

FluidStack

FluidStack aggregates supply from data centers and partners. H100 at $2.10/hr is competitive on-demand pricing with no minimum commitment. Reserved and spot pricing available via contact. Less community documentation than RunPod or Lambda, but pricing is solid and they have been reliable in my testing.

Minimum commitment: None on on-demand.

Jarvislabs

Under-the-radar provider with competitive pricing: A100 80GB at $1.49/hr, H100 at $2.69/hr, H200 at $3.80/hr. Under-90-second instance spin-up, managed Jupyter environments, per-minute billing. Good for quick experiments where you want something between Vast.ai's raw marketplace and Lambda's premium managed experience.

Minimum commitment: None. Per-minute billing.

Together AI

Together AI is primarily an inference and fine-tuning API, but they also sell dedicated GPU clusters. H100 at $3.49-3.99/hr on-demand, with reserved options down to $2.25-2.69/hr. The advantage is integration: if you train on a Together cluster and want to deploy the model for inference, the handoff is built in. For teams already on Together's API, it may be worth consolidating compute there.

Minimum commitment: Reserved pricing requires contact sales.

Hyperstack

Hyperstack (formerly known as Cirrascale) focuses on large-scale GPU clusters with InfiniBand. Pricing is sales-gated across all GPU types. They target funded startups and enterprises building foundation models. If you need a 64+ H100 cluster reliably for months, they are worth a call. For anything smaller or shorter, look elsewhere first.

Minimum commitment: Enterprise contract required.

Spot and Marketplace Providers

These providers offer the lowest absolute prices but with availability and reliability trade-offs.

TensorDock

TensorDock operates its own data centers and offers bare-metal GPU rental at competitive rates. H100 at ~$2.50/hr on-demand, ~$1.60/hr spot. A100 at ~$1.30/hr. RTX 4090 at ~$0.25/hr, which is among the cheapest for that GPU outside of Akash. Straightforward billing, hourly minimum.

Minimum commitment: None. Hourly billing.

Salad

Salad taps a network of 60,000+ consumer gaming PCs. RTX 4090 at $0.20/hr is the cheapest legitimate option I have found for that GPU. No data center GPUs - everything is consumer hardware (RTX 4090, RTX 3090, RX 7900 series). Reliability is consumer-grade: expect occasional node failures and latency variance.

Best fit: inference workloads with stateless execution, image generation, smaller model serving where you can tolerate occasional request failures.

Minimum commitment: None.

Akash Network

Akash is a decentralized compute marketplace where anyone can offer spare GPU capacity. Pricing is bid-based: H100 ranges $1.50-2.00/hr, A100 $0.50-1.00/hr, RTX 4090 as low as $0.10-0.20/hr depending on demand. Payment in AKT (the network's native token) or USDC.

The very low prices are real, but so are the reliability caveats. Providers are anonymous, uptime is not guaranteed, and the tooling (Akash Console or CLI) has a steeper learning curve than RunPod or Lambda. For price-insensitive batch jobs and experiments, it is worth exploring.

Minimum commitment: None. Per-block billing (roughly per-minute).

Inference-Optimized and Serverless Providers

These providers layer developer experience on top of GPU rental. You pay more per GPU-hour but get auto-scaling, cold starts, and no infrastructure management.

Modal is the best developer experience in this list, full stop. Python SDK, auto-scale to zero (no idle GPU costs), cold start under 15 seconds, per-second billing. H100 at ~$3.95/hr when active. The critical difference from other providers: you do not pay when you are not running inference or training. For workloads with bursty traffic or overnight downtime, this beats paying $3.29/hr continuously for a Lambda instance that sits idle.

No SSH access - Modal is Python-only. If you need to poke around the filesystem or run arbitrary shell commands, use something else.

Minimum commitment: None.

Replicate

Replicate prices GPU access bundled with their model hosting platform. H100 at $0.001400/second (~~$5.04/hr) and A100 at $0.001150/second (~~$4.14/hr) via their API. You pay only when actively running inference - no reserved capacity. The effective per-GPU-hour cost is higher than raw rental, but for low-volume inference where you do not want to manage a persistent GPU, Replicate's convenience can justify the premium.

Minimum commitment: None.

Fireworks AI

Fireworks.ai focuses on fast inference and fine-tuning. Their fine-tuning compute is priced per token (see our fine-tuning costs comparison), not per GPU-hour. For raw rental purposes, Fireworks is not a direct competitor - their abstraction sits above GPU-hour billing.

Hyperscalers

AWS, GCP, Azure, and Oracle provide GPU compute alongside their broader cloud platform services. The price premium is real - 50-400% over GPU-first clouds - but the ecosystem value is also real.

AWS

Key GPU instance families relevant for AI in 2026:

p5.48xlarge (8x H100 SXM): ~$3.90/hr per GPU on-demand, ~$2.50/hr spot, $1.90-2.10/hr 1-3 year reserved
p5e.48xlarge (8x H200 SXM): Contact sales pricing
p5en (newer H200 variant): Contact sales
g6 (NVIDIA L40S 48GB): ~$0.75-1.35/hr per GPU depending on config

AWS reserved pricing is genuinely competitive with GPU-first clouds when you commit to 1-3 years - the p5 1-year reserved at ~$1.90/hr per H100 is comparable to Nebius committed pricing. The premium is in on-demand and spot pricing, plus the ecosystem lock-in cost.

Minimum commitment: None on on-demand. 1-3 year terms for reserved pricing.

Google Cloud

Key instances:

A3 High (8x H100 SXM): ~$3.00/hr per GPU on-demand, ~$2.25/hr spot, automatic Sustained Use Discounts (SUD) up to 30% for month-long runs
A3 Ultra (8x H200 NVL): Contact sales

GCP's Sustained Use Discounts are a genuine advantage for month-long training runs - you get automatic discounts without committing upfront. If you know your job will run 30 days straight, GCP's effective rate can undercut Lambda's list price.

Minimum commitment: None for on-demand and spot. Committed Use Discounts available for 1-3 years.

Azure

Key instances:

ND H100 v5 (8x H100 SXM5 with InfiniBand): ~$6.98/hr per GPU on-demand, ~$3.50/hr spot
ND H200 v5 (8x H200 SXM5): Contact sales

Azure's H100 on-demand pricing is the most expensive of any major provider. Their A100 spot pricing at ~$0.74/hr is excellent - one of the cheapest A100 spots available. Azure makes sense for Microsoft ecosystem shops (Azure OpenAI, AzureML, Teams integrations) or for Windows-dependent workloads.

Minimum commitment: None on on-demand and spot. 1-3 year reserved available.

Oracle Cloud

Oracle GPU instances (BM.GPU.H100.8 and variants) run ~$10.00/hr per GPU on-demand with an 8-GPU minimum. Reserved and contracted pricing via sales only. At these prices Oracle Cloud is not competitive for GPU workloads unless you have existing Oracle enterprise agreements or specific compliance requirements that mandate their environment.

Minimum commitment: 8-GPU minimum block on most H100 instances.

Spot vs Reserved vs On-Demand

Understanding the billing model matters as much as the headline rate.

On-Demand: Pay the listed rate, no commitment, cancel anytime. Best for: unpredictable workloads, experiments, one-off training runs. You pay a significant premium for flexibility.

Spot / Preemptible: Use spare capacity at 30-70% discount. The catch: the provider can reclaim the instance with little notice (typically 2 minutes for AWS, 30 seconds for GCP). Best for: training jobs with checkpointing, batch processing, any job that can tolerate interruptions and restart gracefully. Never use spot for production inference serving.

Reserved: Commit to a fixed number of GPU-hours per month (or pay upfront for 1-3 years) in exchange for 40-70% discounts. Best for: production inference clusters running 24/7, multi-month training projects, anything you know will run continuously. The hidden risk: GPU prices are falling fast. A 3-year H100 reservation at today's rates may look expensive in 12 months when on-demand prices have dropped further.

Quick comparison on H100 economics:

Billing Model	Provider	Rate	30-day cost (1x H100)
On-demand	RunPod Community	$1.99/hr	$1,433
Spot	RunPod	~$1.25/hr	~$900
Reserved (committed)	Nebius	$2.00/hr	$1,440
Reserved (3-year)	AWS p5	~$1.90/hr	~$1,368
Reserved (enterprise)	CoreWeave	~$1.45/hr	~$1,044

The CoreWeave reserved rate is genuinely the cheapest per-GPU-hour for H100 at scale - but only accessible with enterprise commitments.

GPU Specifications - What You Are Actually Buying

Not all H100s are equal. The SXM form factor with NVLink and NVSwitch offers substantially higher memory bandwidth than the PCIe version, mattering significantly for large model inference.

GPU	VRAM	BF16 TFLOPs	Mem BW	Form Factor	Best Use Case
H200 SXM5	141GB HBM3e	~1,979	4.8 TB/s	SXM	Largest models, >70B inference
H100 SXM5	80GB HBM3	~1,979	3.35 TB/s	SXM	Training, large model inference
H100 PCIe	80GB HBM2e	~1,513	2.0 TB/s	PCIe	Training, inference (budget SXM alt)
A100 SXM4	80GB HBM2e	~312	2.0 TB/s	SXM	Training 7B-70B, inference serving
A100 PCIe	80GB HBM2e	~312	1.55 TB/s	PCIe	Budget training, smaller inference
L40S	48GB GDDR6	~733	0.86 TB/s	PCIe	Mixed inference/training, cost efficient
RTX 4090	24GB GDDR6X	~165	1.0 TB/s	PCIe	Consumer inference, LoRA fine-tuning
RTX 6000 Ada	48GB GDDR6	~174	0.96 TB/s	PCIe	Pro inference, larger VRAM window

Key practical implications:

H100 SXM5 vs PCIe: SXM5 has 67% higher memory bandwidth. For inference on 70B models, that bandwidth difference translates directly to tokens-per-second. Ask your provider explicitly which variant they are selling - not all listings specify.
A100 vs L40S: The L40S is cheaper per GPU-hour on some platforms and has more VRAM than a PCIe A100. For serving mid-size models (7B-13B), the L40S can be a better deal than the A100 at equivalent price points.
RTX 4090: 24GB VRAM is the binding constraint. Sufficient for 7B models in FP16, 13B in 4-bit quant. Will not fit 70B models even in 4-bit without offloading.

Hidden Costs

The per-GPU-hour rate is the smallest part of your actual bill. Here is what adds up:

Networking and Egress

Most providers charge $0.05-0.12/GB for data egress. For training workloads pulling large datasets from cloud storage to GPU instances, this adds up fast. A 200GB dataset downloaded to a GPU instance three times over the course of experiments costs $30-72 in egress alone.

Providers with free or very cheap egress: Crusoe (no egress charges), RunPod (generous allowances), Lambda (included allowances). Providers that will surprise you: AWS, GCP, and Azure all charge standard egress rates.

Multi-GPU Networking

For distributed training across multiple GPUs, the interconnect matters enormously. NVLink / NVSwitch (within a node) and InfiniBand (between nodes) are not available on all platforms or all instance types.

InfiniBand available: CoreWeave, Lambda (1-Click Clusters), AWS (p5/p4d clusters with EFA), GCP (A3 clusters), Azure (ND H100 v5)
Limited / no InfiniBand: Vast.ai (host-dependent), RunPod (community cloud), Jarvislabs, most marketplace providers

If you are running a multi-node training job on Llama 3 70B or larger, InfiniBand can cut training time by 50%+ compared to standard Ethernet. Factor that into your effective cost per training run, not just per GPU-hour.

Idle GPU Cost

Cloud GPUs bill when allocated, not when actively computing. If you spin up an H100 for a Jupyter session and spend an hour reviewing results before kicking off the next training run, you pay $2-6 for that hour of thinking. This adds up more than people realize.

Modal eliminates this entirely - scale to zero means you pay only for active computation. If your workflow has significant idle time between training runs, serverless compute can be cheaper than reserved raw rental even at Modal's higher per-active-hour rate.

Storage I/O

Model weights for a 70B model sit at ~35-140GB depending on precision. Checkpointing every 500 steps during training generates multiple copies. Cloud storage costs $0.10-0.23/GB/month depending on provider and tier. Budget 500GB-1TB for a serious training project - that is $50-230/month just in storage, before I/O request fees.

Minimum Billing Periods

RunPod and Modal bill per-minute or per-second. AWS, GCP, and Azure bill per-second for most instances but have minimum hourly charges for some reserved instance types. Vast.ai bills per-second. Always check the minimum billing unit when running short jobs - a 10-minute training test on a provider with hourly minimums costs 6x more than expected.

Value Tier Recommendations

Hobbyist (a few hours per week)

Go to: Vast.ai or TensorDock

If you are experimenting with fine-tuning, running inference on open-source models, or hacking on personal projects, start with Vast.ai. RTX 4090 at $0.35-0.55/hr lets you run 7B models in FP16 or 13B in 4-bit for under $2/hr. H100 SXM at $1.38/hr on-demand is available when you need more memory. Per-second billing means a 20-minute experiment costs $0.45.

Filter for verified hosts with 95%+ uptime scores and you will rarely encounter problems.

For hardware context on what you can run locally before spending on cloud, see our home GPU LLM leaderboard and best AI home workstations guide.

Go to: Lambda Labs or RunPod Secure Cloud

Lambda's frictionless environment and SSH access make collaborative research straightforward. No spot pricing means your overnight training runs complete without interruption. H100 at $3.29/hr is not the cheapest, but the zero-setup overhead and per-minute billing make it the right choice when researcher time is more expensive than GPU time.

RunPod Secure Cloud at $2.39-2.69/hr is the budget alternative when you are comfortable managing environments yourself.

Production Inference (24/7 serving)

Go to: CoreWeave (reserved) or Lambda Labs (reserved)

For a production inference endpoint running 24/7, reserved pricing is non-negotiable. CoreWeave's ~$1.45/hr reserved H100 is the best rate I have found for stable enterprise-grade capacity, assuming you can navigate their onboarding process and commit to a multi-month contract. Lambda reserved pricing via sales can get you into the $1.50-2.00/hr range with less overhead.

For serverless inference where traffic is bursty (peaks during business hours, near-zero at night), Modal can be cheaper than reserved raw rental even at its higher per-active-hour rate. Run the math for your actual traffic pattern.

Avoid Vast.ai and Akash Network for production inference - host reliability is not guaranteed.

Month-Long Training Run

Go to: GCP (A3 High + Sustained Use Discounts) or Nebius (committed)

For a training job running 30 days straight, GCP's automatic Sustained Use Discounts drop the effective rate to ~$2.10/hr per H100 without any upfront commitment. If you can tolerate the occasional preemption and have checkpointing in place, GCP spot at ~$2.25/hr with SUD looks compelling.

Nebius committed pricing at $2.00/hr for H100 (or $2.30/hr for H200) with InfiniBand available is my pick if you want to commit upfront and avoid the complexity of GCP's pricing model.

For free options to prototype before committing to a long run, see our guide to free AI inference providers.

FAQ

What is the cheapest H100 I can rent right now?

Vast.ai spot pricing around ~$1.00/hr is consistently the floor. RunPod spot runs ~$1.25/hr. On-demand (no risk of interruption), Vast.ai on-demand starts around $1.38/hr and RunPod Community Cloud is $1.99/hr.

Should I use spot GPUs for training?

Yes, if you implement checkpointing. A training job that saves state every 10-15 minutes can resume from the last checkpoint when a spot instance gets reclaimed. The 30-50% discount over on-demand is worth the 5-10% overhead of writing checkpoint code. Tools like Hugging Face Trainer and Lightning have checkpoint support built in.

Never use spot for production inference serving. Your users will see outages.

Is renting GPUs cheaper than buying?

It depends entirely on utilization. An H100 SXM costs ~$30,000-35,000 new. At Lambda's $3.29/hr, rental breaks even at ~9,100-10,600 hours of continuous use - roughly 12-15 months. If you need the GPU less than 24/7 continuous, rental is cheaper. If you need it 24/7 for 18+ months, buying likely wins on raw cost.

The real comparison also includes colocation, power, cooling, and maintenance costs for owned hardware. See our home workstation guide for what consumer-grade alternatives look like at the lower end.

What is the difference between H100 SXM and H100 PCIe?

SXM uses NVLink and NVSwitch interconnects, giving 3.35 TB/s memory bandwidth. PCIe connects through the motherboard and tops out at ~2.0 TB/s. For training and large-model inference, SXM is meaningfully faster. For smaller models (under 7B) where you are not saturating memory bandwidth, the difference is minor. Expect PCIe H100 instances to be $0.50-1.00/hr cheaper than SXM - sometimes worth it, sometimes not.

Do hyperscalers offer any pricing advantages?

Yes, in two scenarios: (1) spot A100s on Azure at ~$0.74/hr are among the cheapest A100 availability you will find; (2) long-term 3-year reserved H100 pricing on AWS can reach ~$1.90/hr, competitive with GPU-first clouds. For 1-3 year commitments on large clusters where you need enterprise SLAs, hyperscaler reserved pricing is worth serious evaluation.

What about AMD GPUs?

Vultr offers MI300X at ~$1.85/hr and Crusoe offers them as well. The MI300X has 192GB HBM3 memory - more than the H100 SXM - and strong BF16 performance. The main concern is software compatibility. PyTorch ROCm support has improved substantially, but CUDA-specific extensions in some training frameworks can require porting work. If your stack is standard PyTorch with no CUDA custom ops, AMD is worth benchmarking.

Sources:

Master Pricing Table - H100 80GB SXM

Full GPU SKU Comparison Table

Provider-by-Provider Breakdown

GPU-First Clouds

RunPod

Lambda Labs

Vast.ai

Nebius

CoreWeave

Crusoe

FluidStack

Jarvislabs

Together AI

Hyperstack

Spot and Marketplace Providers

TensorDock

Salad

Akash Network

Inference-Optimized and Serverless Providers

Modal

Replicate

Fireworks AI

Hyperscalers

AWS

Google Cloud

Azure

Oracle Cloud

Spot vs Reserved vs On-Demand

GPU Specifications - What You Are Actually Buying

Hidden Costs

Networking and Egress

Multi-GPU Networking

Idle GPU Cost

Storage I/O

Minimum Billing Periods

Value Tier Recommendations

Hobbyist (a few hours per week)

Research Team (on-and-off training, sharing across team)

Production Inference (24/7 serving)

Month-Long Training Run

FAQ

What is the cheapest H100 I can rent right now?

Should I use spot GPUs for training?

Is renting GPUs cheaper than buying?

What is the difference between H100 SXM and H100 PCIe?

Do hyperscalers offer any pricing advantages?

What about AMD GPUs?