what/llm/can/i/run
home/calculator

VRAM Calculator

Precise VRAM accounting — not the "weights only" estimate everyone else gives you. Includes KV cache (grows with context), activation memory, runtime overhead. Answers: will this actually fit?

Configure

model
Qwen2.5 Coder 32B(32.0B params)
context length
Currently: 32,768 tokens
batch size
OS / framework reserve

Result

Total VRAM needed
29.4GB
20.0 GB weights + 7.77 KV + 0.1 act + 1.5 overhead
Est. speed · on RTX 4090 reference
34tok/s
memory-bound · scale ∝ your GPU bandwidth/total
Memory pressure vs RTX 4090 (24 GB cap)⚠ Doesn't fit · −5.4 GB short

Fits on these GPUs

RTX 509032 GB+2.6 GB headroomRTX 508016 GB-13.4 GB shortRTX 5070 Ti16 GB-13.4 GB shortRTX 507012 GB-17.4 GB shortRTX 5060 Ti 16GB16 GB-13.4 GB shortRTX 50608 GB-21.4 GB shortRTX 409024 GB-5.4 GB shortRTX 4080 SUPER16 GB-13.4 GB shortRTX 408016 GB-13.4 GB shortRTX 4070 Ti SUPER16 GB-13.4 GB shortRTX 4070 Ti12 GB-17.4 GB shortRTX 4070 SUPER12 GB-17.4 GB shortRTX 407012 GB-17.4 GB shortRTX 4060 Ti 16GB16 GB-13.4 GB shortRTX 4060 Ti 8GB8 GB-21.4 GB shortRTX 40608 GB-21.4 GB shortRTX 3090 Ti24 GB-5.4 GB shortRTX 309024 GB-5.4 GB shortRTX 3080 Ti12 GB-17.4 GB shortRTX 3080 12GB12 GB-17.4 GB shortRTX 3080 10GB10 GB-19.4 GB shortRTX 3070 Ti8 GB-21.4 GB shortRTX 30708 GB-21.4 GB shortRTX 3060 Ti8 GB-21.4 GB shortRTX 3060 12GB12 GB-17.4 GB shortRTX 3060 8GB8 GB-21.4 GB shortRTX 3050 8GB8 GB-21.4 GB shortRTX 2080 Ti11 GB-18.4 GB shortRTX 2080 SUPER8 GB-21.4 GB shortA6000 Pro48 GB+18.6 GB headroomRTX 6000 Ada48 GB+18.6 GB headroomH100 PCIe80 GB+50.6 GB headroomH100 SXM580 GB+50.6 GB headroomH200 SXM5141 GB+111.6 GB headroomA100 80GB80 GB+50.6 GB headroomA100 40GB40 GB+10.6 GB headroomL40S48 GB+18.6 GB headroomL424 GB-5.4 GB shortM1 16GB16 GB-13.4 GB shortM1 Pro 16GB16 GB-13.4 GB shortM1 Pro 32GB32 GB+2.6 GB headroomM1 Max 32GB32 GB+2.6 GB headroomM1 Max 64GB64 GB+34.6 GB headroomM1 Ultra 64GB64 GB+34.6 GB headroomM1 Ultra 128GB128 GB+98.6 GB headroomM2 16GB16 GB-13.4 GB shortM2 24GB24 GB-5.4 GB shortM2 Pro 16GB16 GB-13.4 GB shortM2 Pro 32GB32 GB+2.6 GB headroomM2 Max 32GB32 GB+2.6 GB headroomM2 Max 64GB64 GB+34.6 GB headroomM2 Max 96GB96 GB+66.6 GB headroomM2 Ultra 64GB64 GB+34.6 GB headroomM2 Ultra 128GB128 GB+98.6 GB headroomM2 Ultra 192GB192 GB+162.6 GB headroomM3 16GB16 GB-13.4 GB shortM3 24GB24 GB-5.4 GB shortM3 Pro 18GB18 GB-11.4 GB shortM3 Pro 36GB36 GB+6.6 GB headroomM3 Max 36GB36 GB+6.6 GB headroomM3 Max 48GB48 GB+18.6 GB headroomM3 Max 64GB64 GB+34.6 GB headroomM3 Max 96GB96 GB+66.6 GB headroomM3 Max 128GB128 GB+98.6 GB headroomM3 Ultra 96GB96 GB+66.6 GB headroomM3 Ultra 192GB192 GB+162.6 GB headroomM3 Ultra 256GB256 GB+226.6 GB headroomM3 Ultra 512GB512 GB+482.6 GB headroomM4 16GB16 GB-13.4 GB shortM4 24GB24 GB-5.4 GB shortM4 32GB32 GB+2.6 GB headroomM4 Pro 24GB24 GB-5.4 GB shortM4 Pro 48GB48 GB+18.6 GB headroomM4 Max 36GB36 GB+6.6 GB headroomM4 Max 48GB48 GB+18.6 GB headroomM4 Max 64GB64 GB+34.6 GB headroomM4 Max 96GB96 GB+66.6 GB headroomM4 Max 128GB128 GB+98.6 GB headroomRX 7900 XTX24 GB-5.4 GB shortRX 7900 XT20 GB-9.4 GB shortRX 7900 GRE16 GB-13.4 GB shortRX 7800 XT16 GB-13.4 GB shortRX 7700 XT12 GB-17.4 GB shortRX 6900 XT16 GB-13.4 GB shortRX 6800 XT16 GB-13.4 GB shortMI300X192 GB+162.6 GB headroomArc A770 16GB16 GB-13.4 GB shortIntel Iris Xe4 GB-25.4 GB short
▸ Why this is more accurate than llama.cpp's default estimate
Most VRAM calculators only tell you weight size. That's missing the point— KV cache often dominates at long context. Here's what we compute: