Last updated: May 2026
Decision rule in one paragraph. Rent A100 if your workload is inference of models up to ~30B at single-GPU scale, fine-tuning ≤13B with LoRA, or anything where you can wait 2-3× longer to save half the cost. Rent H100 if you are serving an LLM at >100 tokens/s per stream, training or fine-tuning a 30B+ model with full backprop, or time-to-result matters more than $/hour. For prototyping below 13B, neither is the right answer — that is an RTX 4090 or RTX 6000 Ada at one-third the price.
This guide gives the verified per-provider rental cost for H100 and A100 across RunPod, Lambda Labs, Hyperstack, and Paperspace as of 2026-05-20, plus the per-workload math that turns those hourly rates into a decision.
H100 vs A100 — specs at a glance
| Spec | NVIDIA A100 SXM 80GB | NVIDIA H100 SXM 80GB |
|---|---|---|
| Architecture | Ampere (2020) | Hopper (2022) |
| VRAM | 80 GB HBM2e | 80 GB HBM3 |
| Memory bandwidth | 2.0 TB/s | 3.35 TB/s |
| FP32 (peak) | 19.5 TFLOPS | 67 TFLOPS |
| TF32 (with sparsity) | 312 TFLOPS | 1,979 TFLOPS |
| BF16 / FP16 (with sparsity) | 624 TFLOPS | 3,958 TFLOPS |
| FP8 (with sparsity) | — | 7,916 TFLOPS |
| NVLink | 600 GB/s | 900 GB/s |
| TDP | 400 W | 700 W |
Specs from NVIDIA's product pages: A100, H100. Last verified 2026-05-20.
On paper, H100 has ~6× the FP16 throughput and ~70% more memory bandwidth than A100 at the same VRAM. The FP8 path is unique to Hopper — for inference workloads that quantize cleanly to FP8, H100 is in another league. For FP32-heavy scientific workloads, the gap is smaller (~3.4×).
Real rental pricing — multi-provider comparison
What you actually pay depends on the provider, the GPU variant (PCIe vs SXM, 80GB vs 40GB), and the billing model. Below: on-demand hourly rates verified by visiting each provider's live pricing page on 2026-05-20.
| Provider | GPU | $/hour | Billing | Source |
|---|---|---|---|---|
| RunPod (Pods) | H100 SXM 80GB | $3.29 | per-second | runpod.io/pricing |
| RunPod (Pods) | H100 PCIe 80GB | $2.89 | per-second | same |
| RunPod (Pods) | H100 NVL 94GB | $3.19 | per-second | same |
| RunPod (Pods) | A100 SXM 80GB | $1.49 | per-second | same |
| RunPod (Pods) | A100 PCIe 80GB | $1.39 | per-second | same |
| Hyperstack | H100 SXM | $2.40 | per-minute | hyperstack.cloud/gpu-pricing |
| Hyperstack | H100 NVLink | $1.95 | per-minute | same |
| Hyperstack | H100 PCIe | $1.90 | per-minute | same |
| Hyperstack | A100 SXM | $1.60 | per-minute | same |
| Hyperstack | A100 NVLink | $1.40 | per-minute | same |
| Hyperstack | A100 PCIe | $1.35 | per-minute | same |
| Lambda Labs | H100 SXM 80GB (1×) | $4.29 | per-minute | lambda.ai/service/gpu-cloud |
| Lambda Labs | H100 SXM 80GB (8×, per GPU) | $3.99 | per-minute | same |
| Lambda Labs | H100 PCIe 80GB | $3.29 | per-minute | same |
| Lambda Labs | A100 SXM 80GB (8×, per GPU) | $2.79 | per-minute | same |
| Lambda Labs | A100 SXM 40GB | $1.99 | per-minute | same |
| Paperspace | H100 (standard) | $5.95 | per-hour | paperspace.com/pricing |
| Paperspace | H100 (3-yr commit) | $2.24 | per-hour | same |
| Paperspace | A100 (3-yr commit) | $1.15 | per-hour | same |
| Vast.ai (market) | H100 SXM 80GB | $2.13 (median, 30d) | per-second | vast.ai/pricing |
| Vast.ai (market) | H100 NVL 94GB | $1.69 (median, 30d) | per-second | same |
| Vast.ai (market) | A100 SXM4 80GB | $1.00 (median, 30d) | per-second | same |
| Vast.ai (market) | A100 PCIe 80GB | $0.67 (median, 30d) | per-second | same |
Cheapest with guaranteed uptime (on-demand SLA tier): Hyperstack — H100 SXM $2.40/hr, H100 PCIe $1.90/hr, A100 SXM $1.60/hr, A100 PCIe $1.35/hr.
Cheapest at all (Vast.ai — market median, interruptible, no SLA, snapshot 2026-05-20): H100 SXM $2.13/hr (full range observed $1.33–$6.71), A100 SXM $1.00/hr ($0.27–$2.67), A100 PCIe $0.67/hr ($0.11–$1.53). Vast prices are set by host competition on a per-second-billed market — the actual quote at rental time can be higher or lower, and hosts can reclaim instances. For batch training or fault-tolerant inference, Vast is the cheapest verified path. For production with guaranteed uptime, Hyperstack on-demand wins the SLA tier.
A100 is consistently about half the price of H100 on the same provider tier. The decision is whether your workload uses 50%+ of H100's extra capability — if not, A100 wins on $/result.
AWS p5 / GCP A3 are excluded from the head-to-head because the hyperscaler pricing model (on-demand vs reserved 1y / 3y, plus network egress) makes a simple hourly comparison misleading. For sustained workloads with negotiated discounts they can compete; on a one-weekend rental the providers above will be 2-3× cheaper.
Cost per workload
These are decision-support estimates derived from provider hourly rates × typical job duration for common workloads at indie-developer rigor. They are not benchmark numbers from a controlled study — treat them as a starting point for your own measurement, with the math shown so you can swap your own throughput numbers in.
Inference: serving Llama 3.3 70B (Q4 quantized)
A 70B model at Q4 fits on a single 80GB A100 or H100. For batch-1 streaming inference:
- A100 SXM typically delivers 30-40 tokens/s on quantized 70B inference at FP16 / Q4 mix. At RunPod's $1.49/hr, that is roughly $0.000013 per output token at steady state.
- H100 SXM with FP8 path enabled delivers 70-90 tokens/s on the same model. At RunPod's $3.29/hr, that is roughly $0.000014 per output token — essentially the same cost per token, in less than half the latency.
Verdict: if you care about throughput per dollar, H100 and A100 are within margin of error on this workload. If you care about per-stream latency (chat UI feel), H100 wins clearly.
Fine-tuning: 13B model with LoRA on 50K samples
A LoRA fine-tune of a 13B base, rank=16, single GPU, sequence length 4096, batch 4-8:
- A100 SXM 80GB: ~14-18 hours for one epoch at 50K samples. At RunPod $1.49/hr: $21-27 per full run.
- H100 SXM 80GB: ~5-7 hours for the same job (FP8 + Flash Attention 3). At RunPod $3.29/hr: $16-23 per full run.
H100 wins on $/job slightly here, and on wall-clock by roughly 3×. Pick H100 when iteration speed shapes your day; pick A100 when you're running 50 of these and the saved 8 hours per job doesn't change your decisions.
Full fine-tune: 7B base model with mixed precision
7B with full backprop, sequence length 2048, batch 16, 100K samples:
- A100 SXM 80GB: ~3-4 days per epoch ≈ $110-150 at $1.49/hr.
- H100 SXM 80GB: ~22-30 hours per epoch ≈ $73-100 at $3.29/hr.
H100 is ahead again. Full fine-tunes are compute-bound; more compute wins.
Multi-GPU pretraining: when an 8×H100 cluster pays off
Pretraining anything above 7B from scratch is uneconomical on rented hardware unless the bill is batched across a project. An 8×H100 SXM cluster on Lambda at $3.99/GPU/hr is $31.92/hour, roughly $23,000/month running 24/7. If you are pretraining below GPT-3 scale (175B-class), this is the wrong shape — use a reserved contract. If you are pretraining anything at 70B+ foundation scale, you are at the wrong rate plan entirely — talk to Lambda 1-Click Clusters, AWS p5.48xlarge reserved, or a hyperscaler with a quota review.
Where H100 wins
- FP8 inference for LLMs and Stable Diffusion. Hopper's transformer engine delivers 2-3× throughput at near-identical quality vs FP16.
- Time-to-result-sensitive work — research iteration loops, agentic systems, real-time inference.
- Large-context inference (32K+ tokens) where memory bandwidth dominates.
- Multi-GPU training with NVLink 4 — 8×H100 SXM with full NVSwitch interconnect is the LLM-class training default in 2026.
Where A100 still wins
- Inference at $/token for batch and async workloads where latency does not matter. A100 at half the hourly rate often beats H100 on throughput-per-dollar.
- Workloads that fit comfortably in 80GB VRAM and do not need FP8.
- Legacy ML stacks built before Hopper FP8 support was stable.
- Availability — A100 supply tends to be more consistent at lower-priced tiers; H100 hits "contact sales" walls faster, especially SXM 8×.
Where neither is the right answer
- Models under 13B at inference time — a single RTX 4090 (24GB) on RunPod at $0.69/hr or an RTX 6000 Ada (48GB) at $0.77/hr will outperform A100 per dollar by 2-3×.
- Workloads needing >80GB VRAM — look at H200 (141GB) at $4.39/hr on RunPod, or B200 (180GB) at $5.89/hr. The "between A100 and H200" middle is covered by NVL/96GB-class SKUs — RTX Pro 6000 96GB on RunPod at $1.89/hr is a notable middle option.
- Image generation at any scale — L40S (48GB, $0.82-0.86/hr depending on provider) is the cost-optimized choice, not H100.
- Sustained high-QPS production inference. Above ~100K req/day on one model, per-token cost on a hosted-inference provider (Groq, Together, Fireworks, HuggingFace Inference Providers) usually beats running rented GPUs. See HuggingFace Inference API 2026.
How to rent without burning your wallet
Before charging a card to any of the providers above, exhaust the free paths:
- Free GPU credits programs — see Free GPU Compute for AWS Activate, GCP, NVIDIA Inception, OVH and academic programs that often hand out hundreds of GPU-hours free.
- Free LLM API for inference — if your end goal is calling a model, see Free LLM API Credits. Provider free tiers cover prototyping without renting any GPU at all.
- Spot / community tier — RunPod's "Community Cloud" is sourced from third-party hosts at the same prices with no SLA. Vast.ai is similar. Acceptable for experiments and overnight jobs.
- Reserved with 3-month commit — once usage is steady, reserved pricing on Hyperstack saves 15-30%; Paperspace 3-year commit drops H100 from $5.95 to $2.24/hr. Break-even vs on-demand is ~2-4 weeks of 24/7 use.
Common mistakes
- Renting H100 hours for a workload that fits on a 4090. The most expensive mistake in GPU rental. A 7B Q4 model fits in 8GB; 13B Q4 in ~10GB. A 4090's 24GB is plenty, and the hourly rate is roughly 5× cheaper than H100.
- Treating spot / community tier as production. Spot instances on Vast or RunPod community can be reclaimed without warning. Use for resumable batch jobs, not live API endpoints.
- Forgetting network egress at AWS / GCP. The hyperscalers charge $0.08-0.12/GB out. A 100GB checkpoint download adds up. RunPod, Lambda, Hyperstack typically do not charge egress — that alone can flip a comparison.
- On-demand 24/7 instead of reserved. Two weeks at 24/7 on-demand has already crossed the break-even for most providers' 3-month commit.
- Buying H100 SXM when PCIe was enough. The SXM premium is mainly for NVLink multi-GPU bandwidth. Single-GPU workloads see no difference. RunPod H100 PCIe at $2.89/hr is the same compute as SXM at $3.29 for one-GPU runs — 12% cheaper.
Frequently asked questions
Is the H100 worth twice the cost of A100? Depends on what "worth" means. On per-stream latency: yes, easily 2× faster. On throughput per dollar (tokens per dollar): roughly tied for inference. On absolute capability for FP8 workloads: yes, no contest. On a 70B Q4 chat where 30 tok/s is enough: no, pick A100.
Can A100 still train modern LLMs in 2026? Yes. A100 is two architecture generations behind but BF16 capability is unchanged. It just takes longer. For LoRA fine-tunes up to 70B (multi-GPU) and full fine-tunes up to 13B, A100 is fully usable in 2026 — the gap is wall-clock, not capability.
Why is H100 SXM more expensive than H100 PCIe at the same provider? SXM is the server-form-factor variant with NVLink connectors, built for 4×/8× clusters where inter-GPU bandwidth (900 GB/s) matters. PCIe variants are limited to PCIe 5.0 (~128 GB/s) between GPUs. Single-GPU workloads see no difference; multi-GPU training is where SXM earns the premium.
Is the H200 a no-brainer over the H100? For workloads bottlenecked on memory capacity (large-context inference, 70B+ at FP16) — yes. H200 has 141GB VRAM vs 80GB and ~50% more memory bandwidth. Compute is similar. RunPod H200 is $4.39/hr vs H100 SXM $3.29/hr — a ~33% premium. If your workload uses the extra 60GB, the premium pays off.
What is the cheapest way to access an H100 for one weekend? With guaranteed uptime: Hyperstack PCIe at $1.90/hr. Two days × 24 hours × $1.90 ≈ $91 — a stable weekend of H100 access for under $100. If you accept interruptions: Vast.ai H100 SXM at ~$2.13/hr 30-day median ≈ $102 for a weekend, with platform range observed as low as $1.33/hr at the cheap end.
Will H100 prices drop as B200 ramps up? Likely yes, on a 6-12 month horizon. B200 is shipping in 2026 at premium (~$5.89/hr observed at RunPod) and will pull the high end up while pulling H100 prices toward the A100 floor. Re-check this article in six months.
Bottom line
For most indie AI developers and small teams in 2026, the decision tree is short:
- Under 13B at inference → 4090 or RTX 6000 Ada, not A100 or H100.
- 13B-30B inference or LoRA fine-tuning → A100 wins on $/output.
- 30B+ training, FP8 inference, latency-critical → H100 wins.
- 70B+ at full precision or 100K-context inference → H200.
- Pretraining from scratch → reserved cluster contract or hyperscaler, not on-demand.
Cheapest with on-demand SLA at writing: Hyperstack. Cheapest at all (interruptible market): Vast.ai. RunPod has the widest GPU catalogue with per-second billing. Lambda has the best multi-GPU SXM pricing for clusters of 8.
Re-check this comparison every quarter — both pricing and the H200 / B200 supply situation will move.
Related guides on this site
- Free GPU Compute — Where to Get Hours Without a Credit Card
- Free LLM API Credits — Every Route from $0 to $10K
- HuggingFace Inference API 2026 — free tier, Endpoints, Providers
- OpenRouter Free Tier 2026 — 28+ free models, limits, BYOK
- Best Open Source LLM 2026 — Model Comparison
- $500K in Free Cloud Credits 2026: 15 Programs Compared
- NVIDIA Inception Program — Free Credits for AI Startups
- Best GPU for AI 2026
- VPS with GPU — Where to Rent
- RunPod vs Lambda Labs vs Vast.ai — 3-provider GPU rental comparison (2026)