H100 vs A100 in 2026: Real Pricing + Workload Picks

Q: Is the H100 worth twice the cost of A100?

Depends on what 'worth' means. On per-stream latency: yes, easily 2x faster. On throughput per dollar (tokens per dollar): roughly tied for inference. On absolute capability for FP8 workloads: yes, no contest. On a 70B Q4 chat where 30 tok/s is enough: no, pick A100.

Last updated: May 2026

Decision rule in one paragraph. Rent A100 if your workload is inference of models up to ~30B at single-GPU scale, fine-tuning ≤13B with LoRA, or anything where you can wait 2-3× longer to save half the cost. Rent H100 if you are serving an LLM at >100 tokens/s per stream, training or fine-tuning a 30B+ model with full backprop, or time-to-result matters more than $/hour. For prototyping below 13B, neither is the right answer - that is an RTX 4090 or RTX 6000 Ada at one-third the price.

This guide (also commonly searched as a100 vs h100 or nvidia h100 vs a100) gives the verified per-provider rental cost for H100 and A100 across RunPod, Lambda Labs, Hyperstack, and Paperspace as of 2026-05-20, plus the per-workload math that turns those hourly rates into a decision.

H100 vs A100 - specs at a glance

Spec	NVIDIA A100 SXM 80GB	NVIDIA H100 SXM 80GB
Architecture	Ampere (2020)	Hopper (2022)
VRAM	80 GB HBM2e	80 GB HBM3
Memory bandwidth	2.0 TB/s	3.35 TB/s
FP32 (peak)	19.5 TFLOPS	67 TFLOPS
TF32 (with sparsity)	312 TFLOPS	1,979 TFLOPS
BF16 / FP16 (with sparsity)	624 TFLOPS	3,958 TFLOPS
FP8 (with sparsity)	-	7,916 TFLOPS
NVLink	600 GB/s	900 GB/s
TDP	400 W	700 W

Specs from NVIDIA's product pages: A100, H100. Last verified 2026-05-20.

On paper, H100 has ~6× the FP16 throughput and ~70% more memory bandwidth than A100 at the same VRAM. The FP8 path is unique to Hopper - for inference workloads that quantize cleanly to FP8, H100 is in another league. For FP32-heavy scientific workloads, the gap is smaller (~3.4×).

Real rental pricing - multi-provider comparison

What you actually pay depends on the provider, the GPU variant (PCIe vs SXM, 80GB vs 40GB), and the billing model. Below: on-demand hourly rates verified by visiting each provider's live pricing page on 2026-05-20.

Provider	GPU	$/hour	Billing	Source
RunPod (Pods)	H100 SXM 80GB	$3.29	per-second	runpod.io/pricing
RunPod (Pods)	H100 PCIe 80GB	$2.89	per-second	same
RunPod (Pods)	H100 NVL 94GB	$3.19	per-second	same
RunPod (Pods)	A100 SXM 80GB	$1.49	per-second	same
RunPod (Pods)	A100 PCIe 80GB	$1.39	per-second	same
Hyperstack	H100 SXM	$2.40	per-minute	hyperstack.cloud/gpu-pricing
Hyperstack	H100 NVLink	$1.95	per-minute	same
Hyperstack	H100 PCIe	$1.90	per-minute	same
Hyperstack	A100 SXM	$1.60	per-minute	same
Hyperstack	A100 NVLink	$1.40	per-minute	same
Hyperstack	A100 PCIe	$1.35	per-minute	same
Lambda Labs	H100 SXM 80GB (1×)	$4.29	per-minute	lambda.ai/service/gpu-cloud
Lambda Labs	H100 SXM 80GB (8×, per GPU)	$3.99	per-minute	same
Lambda Labs	H100 PCIe 80GB	$3.29	per-minute	same
Lambda Labs	A100 SXM 80GB (8×, per GPU)	$2.79	per-minute	same
Lambda Labs	A100 SXM 40GB	$1.99	per-minute	same
Paperspace	H100 (standard)	$5.95	per-hour	paperspace.com/pricing
Paperspace	H100 (3-yr commit)	$2.24	per-hour	same
Paperspace	A100 (3-yr commit)	$1.15	per-hour	same
Vast.ai (market)	H100 SXM 80GB	$2.13 (median, 30d)	per-second	vast.ai/pricing
Vast.ai (market)	H100 NVL 94GB	$1.69 (median, 30d)	per-second	same
Vast.ai (market)	A100 SXM4 80GB	$1.00 (median, 30d)	per-second	same
Vast.ai (market)	A100 PCIe 80GB	$0.67 (median, 30d)	per-second	same

Cheapest with guaranteed uptime (on-demand SLA tier): Hyperstack - H100 SXM $2.40/hr, H100 PCIe $1.90/hr, A100 SXM $1.60/hr, A100 PCIe $1.35/hr.

Cheapest at all (Vast.ai - market median, interruptible, no SLA, snapshot 2026-05-20): H100 SXM $2.13/hr (full range observed $1.33-$6.71), A100 SXM $1.00/hr ($0.27-$2.67), A100 PCIe $0.67/hr ($0.11-$1.53). Vast prices are set by host competition on a per-second-billed market - the actual quote at rental time can be higher or lower, and hosts can reclaim instances. For batch training or fault-tolerant inference, Vast is the cheapest verified path. For production with guaranteed uptime, Hyperstack on-demand wins the SLA tier.

A100 is consistently about half the price of H100 on the same provider tier. The decision is whether your workload uses 50%+ of H100's extra capability - if not, A100 wins on $/result.

AWS p5 / GCP A3 are excluded from the head-to-head because the hyperscaler pricing model (on-demand vs reserved 1y / 3y, plus network egress) makes a simple hourly comparison misleading. For sustained workloads with negotiated discounts they can compete; on a one-weekend rental the providers above will be 2-3× cheaper.

Cost per workload

These are decision-support estimates derived from provider hourly rates × typical job duration for common workloads at indie-developer rigor. They are not benchmark numbers from a controlled study - treat them as a starting point for your own measurement, with the math shown so you can swap your own throughput numbers in.

Inference: serving Llama 3.3 70B (Q4 quantized)

A 70B model at Q4 fits on a single 80GB A100 or H100. For batch-1 streaming inference:

A100 SXM typically delivers 30-40 tokens/s on quantized 70B inference at FP16 / Q4 mix. At RunPod's $1.49/hr, that is roughly $0.000013 per output token at steady state.
H100 SXM with FP8 path enabled delivers 70-90 tokens/s on the same model. At RunPod's $3.29/hr, that is roughly $0.000014 per output token - essentially the same cost per token, in less than half the latency.

Verdict: if you care about throughput per dollar, H100 and A100 are within margin of error on this workload. If you care about per-stream latency (chat UI feel), H100 wins clearly.

Fine-tuning: 13B model with LoRA on 50K samples

A LoRA fine-tune of a 13B base, rank=16, single GPU, sequence length 4096, batch 4-8:

A100 SXM 80GB: ~14-18 hours for one epoch at 50K samples. At RunPod $1.49/hr: $21-27 per full run.
H100 SXM 80GB: ~5-7 hours for the same job (FP8 + Flash Attention 3). At RunPod $3.29/hr: $16-23 per full run.

H100 wins on $/job slightly here, and on wall-clock by roughly 3×. Pick H100 when iteration speed shapes your day; pick A100 when you're running 50 of these and the saved 8 hours per job doesn't change your decisions.

Full fine-tune: 7B base model with mixed precision

7B with full backprop, sequence length 2048, batch 16, 100K samples:

A100 SXM 80GB: ~3-4 days per epoch ≈ $110-150 at $1.49/hr.
H100 SXM 80GB: ~22-30 hours per epoch ≈ $73-100 at $3.29/hr.

H100 is ahead again. Full fine-tunes are compute-bound; more compute wins.

Multi-GPU pretraining: when an 8×H100 cluster pays off

Pretraining anything above 7B from scratch is uneconomical on rented hardware unless the bill is batched across a project. An 8×H100 SXM cluster on Lambda at $3.99/GPU/hr is $31.92/hour, roughly $23,000/month running 24/7. If you are pretraining below GPT-3 scale (175B-class), this is the wrong shape - use a reserved contract. If you are pretraining anything at 70B+ foundation scale, you are at the wrong rate plan entirely - talk to Lambda 1-Click Clusters, AWS p5.48xlarge reserved, or a hyperscaler with a quota review.

Where H100 wins

FP8 inference for LLMs and Stable Diffusion. Hopper's transformer engine delivers 2-3× throughput at near-identical quality vs FP16.
Time-to-result-sensitive work - research iteration loops, agentic systems, real-time inference.
Large-context inference (32K+ tokens) where memory bandwidth dominates.
Multi-GPU training with NVLink 4 - 8×H100 SXM with full NVSwitch interconnect is the LLM-class training default in 2026.

Where A100 still wins

Inference at $/token for batch and async workloads where latency does not matter. A100 at half the hourly rate often beats H100 on throughput-per-dollar.
Workloads that fit comfortably in 80GB VRAM and do not need FP8.
Legacy ML stacks built before Hopper FP8 support was stable.
Availability - A100 supply tends to be more consistent at lower-priced tiers; H100 hits "contact sales" walls faster, especially SXM 8×.

Where neither is the right answer

Models under 13B at inference time - a single RTX 4090 (24GB) on RunPod at $0.69/hr or an RTX 6000 Ada (48GB) at $0.77/hr will outperform A100 per dollar by 2-3×.
Workloads needing >80GB VRAM - look at H200 (141GB) at $4.39/hr on RunPod, or B200 (180GB) at $5.89/hr. The "between A100 and H200" middle is covered by NVL/96GB-class SKUs - RTX Pro 6000 96GB on RunPod at $1.89/hr is a notable middle option.
Image generation at any scale - L40S (48GB, $0.82-0.86/hr depending on provider) is the cost-optimized choice, not H100.
Sustained high-QPS production inference. Above ~100K req/day on one model, per-token cost on a hosted-inference provider (Groq, Together, Fireworks, HuggingFace Inference Providers) usually beats running rented GPUs. See HuggingFace Inference API 2026.

How to rent without burning your wallet

Before charging a card to any of the providers above, exhaust the free paths:

Free GPU credits programs - see Free GPU Compute for AWS Activate, GCP, NVIDIA Inception, OVH and academic programs that often hand out hundreds of GPU-hours free.
Free LLM API for inference - if your end goal is calling a model, see Free LLM API Credits. Provider free tiers cover prototyping without renting any GPU at all.
Spot / community tier - RunPod's "Community Cloud" is sourced from third-party hosts at the same prices with no SLA. Vast.ai is similar. Acceptable for experiments and overnight jobs.
Reserved with 3-month commit - once usage is steady, reserved pricing on Hyperstack saves 15-30%; Paperspace 3-year commit drops H100 from $5.95 to $2.24/hr. Break-even vs on-demand is ~2-4 weeks of 24/7 use.

Common mistakes

Renting H100 hours for a workload that fits on a 4090. The most expensive mistake in GPU rental. A 7B Q4 model fits in 8GB; 13B Q4 in ~10GB. A 4090's 24GB is plenty, and the hourly rate is roughly 5× cheaper than H100.
Treating spot / community tier as production. Spot instances on Vast or RunPod community can be reclaimed without warning. Use for resumable batch jobs, not live API endpoints.
Forgetting network egress at AWS / GCP. The hyperscalers charge $0.08-0.12/GB out. A 100GB checkpoint download adds up. RunPod, Lambda, Hyperstack typically do not charge egress - that alone can flip a comparison.
On-demand 24/7 instead of reserved. Two weeks at 24/7 on-demand has already crossed the break-even for most providers' 3-month commit.
Buying H100 SXM when PCIe was enough. The SXM premium is mainly for NVLink multi-GPU bandwidth. Single-GPU workloads see no difference. RunPod H100 PCIe at $2.89/hr is the same compute as SXM at $3.29 for one-GPU runs - 12% cheaper.

Frequently asked questions

Is the H100 worth twice the cost of A100? Depends on what "worth" means. On per-stream latency: yes, easily 2× faster. On throughput per dollar (tokens per dollar): roughly tied for inference. On absolute capability for FP8 workloads: yes, no contest. On a 70B Q4 chat where 30 tok/s is enough: no, pick A100.

Can A100 still train modern LLMs in 2026? Yes. A100 is two architecture generations behind but BF16 capability is unchanged. It just takes longer. For LoRA fine-tunes up to 70B (multi-GPU) and full fine-tunes up to 13B, A100 is fully usable in 2026 - the gap is wall-clock, not capability.

Why is H100 SXM more expensive than H100 PCIe at the same provider? SXM is the server-form-factor variant with NVLink connectors, built for 4×/8× clusters where inter-GPU bandwidth (900 GB/s) matters. PCIe variants are limited to PCIe 5.0 (~128 GB/s) between GPUs. Single-GPU workloads see no difference; multi-GPU training is where SXM earns the premium.

Is the H200 a no-brainer over the H100? For workloads bottlenecked on memory capacity (large-context inference, 70B+ at FP16) - yes. H200 has 141GB VRAM vs 80GB and ~50% more memory bandwidth. Compute is similar. RunPod H200 is $4.39/hr vs H100 SXM $3.29/hr - a ~33% premium. If your workload uses the extra 60GB, the premium pays off.

What is the cheapest way to access an H100 for one weekend? With guaranteed uptime: Hyperstack PCIe at $1.90/hr. Two days × 24 hours × $1.90 ≈ $91 - a stable weekend of H100 access for under $100. If you accept interruptions: Vast.ai H100 SXM at ~$2.13/hr 30-day median ≈ $102 for a weekend, with platform range observed as low as $1.33/hr at the cheap end.

Will H100 prices drop as B200 ramps up? Likely yes, on a 6-12 month horizon. B200 is shipping in 2026 at premium (~$5.89/hr observed at RunPod) and will pull the high end up while pulling H100 prices toward the A100 floor. Re-check this article in six months.

Bottom line

For most indie AI developers and small teams in 2026, the decision tree is short:

Under 13B at inference → 4090 or RTX 6000 Ada, not A100 or H100.
13B-30B inference or LoRA fine-tuning → A100 wins on $/output.
30B+ training, FP8 inference, latency-critical → H100 wins.
70B+ at full precision or 100K-context inference → H200.
Pretraining from scratch → reserved cluster contract or hyperscaler, not on-demand.

Cheapest with on-demand SLA at writing: Hyperstack. Cheapest at all (interruptible market): Vast.ai. RunPod has the widest GPU catalogue with per-second billing. Lambda has the best multi-GPU SXM pricing for clusters of 8.

Re-check this comparison every quarter - both pricing and the H200 / B200 supply situation will move.