| by gpu tier | Q2_K 3.4 GB | Q4_K_M 4.5 GB | Q5_K_M 5.6 GB | Q6_K 6.8 GB | Q8_0 9.0 GB | FP16 18.0 GB |
|---|---|---|---|---|---|---|
| 8 GB tier rtx 3050 · 4060 | GREAT 67 t/s | GREAT 52 t/s | GREAT 42 t/s | OK 36 t/s | ||
| 12 GB tier rtx 3060 · 4070 | GREAT 99 t/s | GREAT 77 t/s | GREAT 62 t/s | GREAT 53 t/s | GREAT 40 t/s | |
| 16 GB tier rtx 4080 · 4060 ti 16g | GREAT 148 t/s | GREAT 115 t/s | GREAT 94 t/s | GREAT 79 t/s | GREAT 60 t/s | |
| 24 GB tier rtx 3090 · 4090 | GREAT 240 t/s | GREAT 186 t/s | GREAT 151 t/s | GREAT 128 t/s | GREAT 97 t/s | GREAT 50 t/s |
| 32 GB tier rtx 5090 · m3 max | GREAT 445 t/s | GREAT 344 t/s | GREAT 281 t/s | GREAT 237 t/s | GREAT 181 t/s | GREAT 93 t/s |
| 48 GB tier a6000 · m3 max 64 | GREAT 143 t/s | GREAT 111 t/s | GREAT 91 t/s | GREAT 76 t/s | GREAT 58 t/s | GREAT 30 t/s |
| 80 GB tier h100 · m3 ultra 128 | GREAT 487 t/s | GREAT 377 t/s | GREAT 308 t/s | GREAT 260 t/s | GREAT 198 t/s | GREAT 102 t/s |