| by gpu tier | Q2_K 10.1 GB | Q4_K_M 13.5 GB | Q5_K_M 16.9 GB | Q6_K 20.3 GB | Q8_0 27.0 GB | FP16 54.0 GB |
|---|---|---|---|---|---|---|
| 8 GB tier rtx 3050 · 4060 | ||||||
| 12 GB tier rtx 3060 · 4070 | OK 36 t/s | |||||
| 16 GB tier rtx 4080 · 4060 ti 16g | GREAT 54 t/s | OK 41 t/s | ||||
| 24 GB tier rtx 3090 · 4090 | GREAT 87 t/s | GREAT 66 t/s | GREAT 53 t/s | OK 45 t/s | ||
| 32 GB tier rtx 5090 · m3 max | GREAT 162 t/s | GREAT 123 t/s | GREAT 99 t/s | GREAT 83 t/s | OK 62 t/s | |
| 48 GB tier a6000 · m3 max 64 | GREAT 52 t/s | GREAT 40 t/s | GREAT 32 t/s | GREAT 27 t/s | GREAT 20 t/s | |
| 80 GB tier h100 · m3 ultra 128 | GREAT 177 t/s | GREAT 134 t/s | GREAT 108 t/s | GREAT 91 t/s | GREAT 68 t/s | GREAT 34 t/s |