KV Cache GPU Calculator
LLM inference memory estimator ·
⚡ v2.0
Light
Model Parameters
Hugging Face Model ID
Fetch Model Info
Model Weights (GB)
Layers (L)
KV Heads (H)
Head Dimension (d)
Bytes per Element (B)
2 — bf16 / fp16
4 — fp32
1 — int8 / fp8
Max Context Length (T)
Total Users (U)
GPU Type
A100 SXM (80 GB)
H100 SXM (80 GB)
H200 SXM (141 GB)
A6000 (48 GB)
RTX 4090 (24 GB)
A100 PCIe (40 GB)
Peak Concurrency
30%
Safety Buffer (%)
Calculate GPU Requirements
Results
GPUs Required
—
GPU
Step-by-Step Derivation
Export Report
Export Markdown
Print