what/llm/can/i/run
models/qwen-qwen2.5-72b-instruct

Qwen2.5 72B

Sources arena · Confidence C
params
72B
dense
bench sources
1
single
context
max tokens
license
Qwen
see source

Model meta

canonical
Qwen/Qwen2.5-72B-Instruct
parameters
72B
organization
Alibaba
license
Qwen
context
downloads

Benchmark breakdown

Hardware fit matrix

by gpu tierQ2_K
27.0 GB
Q4_K_M
36.0 GB
Q5_K_M
45.0 GB
Q6_K
54.0 GB
Q8_0
72.0 GB
FP16
144.0 GB
G8 GB tier
rtx 3050 · 4060
G12 GB tier
rtx 3060 · 4070
G16 GB tier
rtx 4080 · 4060 ti 16g
G24 GB tier
rtx 3090 · 4090
G32 GB tier
rtx 5090 · m3 max
OK
62 t/s
G48 GB tier
a6000 · m3 max 64
GREAT
20 t/s
GREAT
15 t/s
G80 GB tier
h100 · m3 ultra 128
GREAT
68 t/s
GREAT
51 t/s
GREAT
41 t/s
GREAT
34 t/s
TIGHT
26 t/s
full fit · production speed tight fit · usabledoesn't fit
Get personalized ranking
Tell us your machine — we'll tell you if this is actually your best pick, or what's better.
Rank for my hardware