| by gpu tier | Q2_K 11.3 GB | Q4_K_M 15.0 GB | Q5_K_M 18.8 GB | Q6_K 22.5 GB | Q8_0 30.0 GB | FP16 60.0 GB |
|---|---|---|---|---|---|---|
| 8 GB tier rtx 3050 · 4060 | ||||||
| 12 GB tier rtx 3060 · 4070 | ||||||
| 16 GB tier rtx 4080 · 4060 ti 16g | GREAT 49 t/s | |||||
| 24 GB tier rtx 3090 · 4090 | GREAT 79 t/s | GREAT 60 t/s | GREAT 48 t/s | |||
| 32 GB tier rtx 5090 · m3 max | GREAT 146 t/s | GREAT 111 t/s | GREAT 89 t/s | GREAT 75 t/s | ||
| 48 GB tier a6000 · m3 max 64 | GREAT 47 t/s | GREAT 36 t/s | GREAT 29 t/s | GREAT 24 t/s | GREAT 18 t/s | |
| 80 GB tier h100 · m3 ultra 128 | GREAT 160 t/s | GREAT 121 t/s | GREAT 98 t/s | GREAT 82 t/s | GREAT 62 t/s | GREAT 31 t/s |