H100 vs A100 in 2026: Real Pricing + Workload Picks

Last updated: May 2026

Decision rule in one paragraph. Rent A100 if your workload is inference of models up to ~30B at single-GPU scale, fine-tuning ≤13B with LoRA, or anything where you can wait 2-3× longer to save half the cost. Rent H100 if you are serving an LLM at >100 tokens/s per stream, training or fine-tuning a 30B+ model with full backprop, or time-to-result matters more than $/hour. For prototyping below 13B, neither is the right answer — that is an RTX 4090 or RTX 6000 Ada at one-third the price.

This guide gives the verified per-provider rental cost for H100 and A100 across RunPod, Lambda Labs, Hyperstack, and Paperspace as of 2026-05-20, plus the per-workload math that turns those hourly rates into a decision.

H100 vs A100 — specs at a glance

SpecNVIDIA A100 SXM 80GBNVIDIA H100 SXM 80GB
ArchitectureAmpere (2020)Hopper (2022)
VRAM80 GB HBM2e80 GB HBM3
Memory bandwidth2.0 TB/s3.35 TB/s
FP32 (peak)19.5 TFLOPS67 TFLOPS
TF32 (with sparsity)312 TFLOPS1,979 TFLOPS
BF16 / FP16 (with sparsity)624 TFLOPS3,958 TFLOPS
FP8 (with sparsity)7,916 TFLOPS
NVLink600 GB/s900 GB/s
TDP400 W700 W

Specs from NVIDIA's product pages: A100, H100. Last verified 2026-05-20.

On paper, H100 has ~6× the FP16 throughput and ~70% more memory bandwidth than A100 at the same VRAM. The FP8 path is unique to Hopper — for inference workloads that quantize cleanly to FP8, H100 is in another league. For FP32-heavy scientific workloads, the gap is smaller (~3.4×).

Real rental pricing — multi-provider comparison

What you actually pay depends on the provider, the GPU variant (PCIe vs SXM, 80GB vs 40GB), and the billing model. Below: on-demand hourly rates verified by visiting each provider's live pricing page on 2026-05-20.

ProviderGPU$/hourBillingSource
RunPod (Pods)H100 SXM 80GB$3.29per-secondrunpod.io/pricing
RunPod (Pods)H100 PCIe 80GB$2.89per-secondsame
RunPod (Pods)H100 NVL 94GB$3.19per-secondsame
RunPod (Pods)A100 SXM 80GB$1.49per-secondsame
RunPod (Pods)A100 PCIe 80GB$1.39per-secondsame
HyperstackH100 SXM$2.40per-minutehyperstack.cloud/gpu-pricing
HyperstackH100 NVLink$1.95per-minutesame
HyperstackH100 PCIe$1.90per-minutesame
HyperstackA100 SXM$1.60per-minutesame
HyperstackA100 NVLink$1.40per-minutesame
HyperstackA100 PCIe$1.35per-minutesame
Lambda LabsH100 SXM 80GB (1×)$4.29per-minutelambda.ai/service/gpu-cloud
Lambda LabsH100 SXM 80GB (8×, per GPU)$3.99per-minutesame
Lambda LabsH100 PCIe 80GB$3.29per-minutesame
Lambda LabsA100 SXM 80GB (8×, per GPU)$2.79per-minutesame
Lambda LabsA100 SXM 40GB$1.99per-minutesame
PaperspaceH100 (standard)$5.95per-hourpaperspace.com/pricing
PaperspaceH100 (3-yr commit)$2.24per-hoursame
PaperspaceA100 (3-yr commit)$1.15per-hoursame
Vast.ai (market)H100 SXM 80GB$2.13 (median, 30d)per-secondvast.ai/pricing
Vast.ai (market)H100 NVL 94GB$1.69 (median, 30d)per-secondsame
Vast.ai (market)A100 SXM4 80GB$1.00 (median, 30d)per-secondsame
Vast.ai (market)A100 PCIe 80GB$0.67 (median, 30d)per-secondsame

Cheapest with guaranteed uptime (on-demand SLA tier): Hyperstack — H100 SXM $2.40/hr, H100 PCIe $1.90/hr, A100 SXM $1.60/hr, A100 PCIe $1.35/hr.

Cheapest at all (Vast.ai — market median, interruptible, no SLA, snapshot 2026-05-20): H100 SXM $2.13/hr (full range observed $1.33–$6.71), A100 SXM $1.00/hr ($0.27–$2.67), A100 PCIe $0.67/hr ($0.11–$1.53). Vast prices are set by host competition on a per-second-billed market — the actual quote at rental time can be higher or lower, and hosts can reclaim instances. For batch training or fault-tolerant inference, Vast is the cheapest verified path. For production with guaranteed uptime, Hyperstack on-demand wins the SLA tier.

A100 is consistently about half the price of H100 on the same provider tier. The decision is whether your workload uses 50%+ of H100's extra capability — if not, A100 wins on $/result.

AWS p5 / GCP A3 are excluded from the head-to-head because the hyperscaler pricing model (on-demand vs reserved 1y / 3y, plus network egress) makes a simple hourly comparison misleading. For sustained workloads with negotiated discounts they can compete; on a one-weekend rental the providers above will be 2-3× cheaper.

Cost per workload

These are decision-support estimates derived from provider hourly rates × typical job duration for common workloads at indie-developer rigor. They are not benchmark numbers from a controlled study — treat them as a starting point for your own measurement, with the math shown so you can swap your own throughput numbers in.

Inference: serving Llama 3.3 70B (Q4 quantized)

A 70B model at Q4 fits on a single 80GB A100 or H100. For batch-1 streaming inference:

  • A100 SXM typically delivers 30-40 tokens/s on quantized 70B inference at FP16 / Q4 mix. At RunPod's $1.49/hr, that is roughly $0.000013 per output token at steady state.
  • H100 SXM with FP8 path enabled delivers 70-90 tokens/s on the same model. At RunPod's $3.29/hr, that is roughly $0.000014 per output token — essentially the same cost per token, in less than half the latency.

Verdict: if you care about throughput per dollar, H100 and A100 are within margin of error on this workload. If you care about per-stream latency (chat UI feel), H100 wins clearly.

Fine-tuning: 13B model with LoRA on 50K samples

A LoRA fine-tune of a 13B base, rank=16, single GPU, sequence length 4096, batch 4-8:

  • A100 SXM 80GB: ~14-18 hours for one epoch at 50K samples. At RunPod $1.49/hr: $21-27 per full run.
  • H100 SXM 80GB: ~5-7 hours for the same job (FP8 + Flash Attention 3). At RunPod $3.29/hr: $16-23 per full run.

H100 wins on $/job slightly here, and on wall-clock by roughly 3×. Pick H100 when iteration speed shapes your day; pick A100 when you're running 50 of these and the saved 8 hours per job doesn't change your decisions.

Full fine-tune: 7B base model with mixed precision

7B with full backprop, sequence length 2048, batch 16, 100K samples:

  • A100 SXM 80GB: ~3-4 days per epoch ≈ $110-150 at $1.49/hr.
  • H100 SXM 80GB: ~22-30 hours per epoch ≈ $73-100 at $3.29/hr.

H100 is ahead again. Full fine-tunes are compute-bound; more compute wins.

Multi-GPU pretraining: when an 8×H100 cluster pays off

Pretraining anything above 7B from scratch is uneconomical on rented hardware unless the bill is batched across a project. An 8×H100 SXM cluster on Lambda at $3.99/GPU/hr is $31.92/hour, roughly $23,000/month running 24/7. If you are pretraining below GPT-3 scale (175B-class), this is the wrong shape — use a reserved contract. If you are pretraining anything at 70B+ foundation scale, you are at the wrong rate plan entirely — talk to Lambda 1-Click Clusters, AWS p5.48xlarge reserved, or a hyperscaler with a quota review.

Where H100 wins

  • FP8 inference for LLMs and Stable Diffusion. Hopper's transformer engine delivers 2-3× throughput at near-identical quality vs FP16.
  • Time-to-result-sensitive work — research iteration loops, agentic systems, real-time inference.
  • Large-context inference (32K+ tokens) where memory bandwidth dominates.
  • Multi-GPU training with NVLink 4 — 8×H100 SXM with full NVSwitch interconnect is the LLM-class training default in 2026.

Where A100 still wins

  • Inference at $/token for batch and async workloads where latency does not matter. A100 at half the hourly rate often beats H100 on throughput-per-dollar.
  • Workloads that fit comfortably in 80GB VRAM and do not need FP8.
  • Legacy ML stacks built before Hopper FP8 support was stable.
  • Availability — A100 supply tends to be more consistent at lower-priced tiers; H100 hits "contact sales" walls faster, especially SXM 8×.

Where neither is the right answer

  • Models under 13B at inference time — a single RTX 4090 (24GB) on RunPod at $0.69/hr or an RTX 6000 Ada (48GB) at $0.77/hr will outperform A100 per dollar by 2-3×.
  • Workloads needing >80GB VRAM — look at H200 (141GB) at $4.39/hr on RunPod, or B200 (180GB) at $5.89/hr. The "between A100 and H200" middle is covered by NVL/96GB-class SKUs — RTX Pro 6000 96GB on RunPod at $1.89/hr is a notable middle option.
  • Image generation at any scale — L40S (48GB, $0.82-0.86/hr depending on provider) is the cost-optimized choice, not H100.
  • Sustained high-QPS production inference. Above ~100K req/day on one model, per-token cost on a hosted-inference provider (Groq, Together, Fireworks, HuggingFace Inference Providers) usually beats running rented GPUs. See HuggingFace Inference API 2026.

How to rent without burning your wallet

Before charging a card to any of the providers above, exhaust the free paths:

  1. Free GPU credits programs — see Free GPU Compute for AWS Activate, GCP, NVIDIA Inception, OVH and academic programs that often hand out hundreds of GPU-hours free.
  2. Free LLM API for inference — if your end goal is calling a model, see Free LLM API Credits. Provider free tiers cover prototyping without renting any GPU at all.
  3. Spot / community tier — RunPod's "Community Cloud" is sourced from third-party hosts at the same prices with no SLA. Vast.ai is similar. Acceptable for experiments and overnight jobs.
  4. Reserved with 3-month commit — once usage is steady, reserved pricing on Hyperstack saves 15-30%; Paperspace 3-year commit drops H100 from $5.95 to $2.24/hr. Break-even vs on-demand is ~2-4 weeks of 24/7 use.

Common mistakes

  • Renting H100 hours for a workload that fits on a 4090. The most expensive mistake in GPU rental. A 7B Q4 model fits in 8GB; 13B Q4 in ~10GB. A 4090's 24GB is plenty, and the hourly rate is roughly 5× cheaper than H100.
  • Treating spot / community tier as production. Spot instances on Vast or RunPod community can be reclaimed without warning. Use for resumable batch jobs, not live API endpoints.
  • Forgetting network egress at AWS / GCP. The hyperscalers charge $0.08-0.12/GB out. A 100GB checkpoint download adds up. RunPod, Lambda, Hyperstack typically do not charge egress — that alone can flip a comparison.
  • On-demand 24/7 instead of reserved. Two weeks at 24/7 on-demand has already crossed the break-even for most providers' 3-month commit.
  • Buying H100 SXM when PCIe was enough. The SXM premium is mainly for NVLink multi-GPU bandwidth. Single-GPU workloads see no difference. RunPod H100 PCIe at $2.89/hr is the same compute as SXM at $3.29 for one-GPU runs — 12% cheaper.

Frequently asked questions

Is the H100 worth twice the cost of A100? Depends on what "worth" means. On per-stream latency: yes, easily 2× faster. On throughput per dollar (tokens per dollar): roughly tied for inference. On absolute capability for FP8 workloads: yes, no contest. On a 70B Q4 chat where 30 tok/s is enough: no, pick A100.

Can A100 still train modern LLMs in 2026? Yes. A100 is two architecture generations behind but BF16 capability is unchanged. It just takes longer. For LoRA fine-tunes up to 70B (multi-GPU) and full fine-tunes up to 13B, A100 is fully usable in 2026 — the gap is wall-clock, not capability.

Why is H100 SXM more expensive than H100 PCIe at the same provider? SXM is the server-form-factor variant with NVLink connectors, built for 4×/8× clusters where inter-GPU bandwidth (900 GB/s) matters. PCIe variants are limited to PCIe 5.0 (~128 GB/s) between GPUs. Single-GPU workloads see no difference; multi-GPU training is where SXM earns the premium.

Is the H200 a no-brainer over the H100? For workloads bottlenecked on memory capacity (large-context inference, 70B+ at FP16) — yes. H200 has 141GB VRAM vs 80GB and ~50% more memory bandwidth. Compute is similar. RunPod H200 is $4.39/hr vs H100 SXM $3.29/hr — a ~33% premium. If your workload uses the extra 60GB, the premium pays off.

What is the cheapest way to access an H100 for one weekend? With guaranteed uptime: Hyperstack PCIe at $1.90/hr. Two days × 24 hours × $1.90 ≈ $91 — a stable weekend of H100 access for under $100. If you accept interruptions: Vast.ai H100 SXM at ~$2.13/hr 30-day median ≈ $102 for a weekend, with platform range observed as low as $1.33/hr at the cheap end.

Will H100 prices drop as B200 ramps up? Likely yes, on a 6-12 month horizon. B200 is shipping in 2026 at premium (~$5.89/hr observed at RunPod) and will pull the high end up while pulling H100 prices toward the A100 floor. Re-check this article in six months.

Bottom line

For most indie AI developers and small teams in 2026, the decision tree is short:

  • Under 13B at inference → 4090 or RTX 6000 Ada, not A100 or H100.
  • 13B-30B inference or LoRA fine-tuning → A100 wins on $/output.
  • 30B+ training, FP8 inference, latency-critical → H100 wins.
  • 70B+ at full precision or 100K-context inference → H200.
  • Pretraining from scratch → reserved cluster contract or hyperscaler, not on-demand.

Cheapest with on-demand SLA at writing: Hyperstack. Cheapest at all (interruptible market): Vast.ai. RunPod has the widest GPU catalogue with per-second billing. Lambda has the best multi-GPU SXM pricing for clusters of 8.

Re-check this comparison every quarter — both pricing and the H200 / B200 supply situation will move.

enjoyed this?

Follow me for more on AI agents, dev tools, and building with LLMs.

X / Twitter LinkedIn GitHub
← Back to blog