what/llm/can/i/run
models/meta-llama-llama-3.1-8b-instruct

Meta Llama 3.1 8B

Sources arena · Confidence C
params
8B
dense
bench sources
1
single
context
max tokens
license
Llama 3.1 Community
community

Model meta

canonical
meta-llama/Llama-3.1-8B-Instruct
parameters
8B
organization
Meta
license
Llama 3.1 Community
context
downloads

Benchmark breakdown

Hardware fit matrix

by gpu tierQ2_K
3.0 GB
Q4_K_M
4.0 GB
Q5_K_M
5.0 GB
Q6_K
6.0 GB
Q8_0
8.0 GB
FP16
16.0 GB
G8 GB tier
rtx 3050 · 4060
GREAT
74 t/s
GREAT
57 t/s
GREAT
47 t/s
OK
40 t/s
G12 GB tier
rtx 3060 · 4070
GREAT
110 t/s
GREAT
85 t/s
GREAT
70 t/s
GREAT
59 t/s
GREAT
45 t/s
G16 GB tier
rtx 4080 · 4060 ti 16g
GREAT
164 t/s
GREAT
128 t/s
GREAT
104 t/s
GREAT
88 t/s
GREAT
67 t/s
G24 GB tier
rtx 3090 · 4090
GREAT
266 t/s
GREAT
206 t/s
GREAT
169 t/s
GREAT
143 t/s
GREAT
109 t/s
GREAT
56 t/s
G32 GB tier
rtx 5090 · m3 max
GREAT
493 t/s
GREAT
383 t/s
GREAT
313 t/s
GREAT
265 t/s
GREAT
202 t/s
GREAT
104 t/s
G48 GB tier
a6000 · m3 max 64
GREAT
159 t/s
GREAT
123 t/s
GREAT
101 t/s
GREAT
85 t/s
GREAT
65 t/s
GREAT
34 t/s
G80 GB tier
h100 · m3 ultra 128
GREAT
540 t/s
GREAT
419 t/s
GREAT
343 t/s
GREAT
290 t/s
GREAT
221 t/s
GREAT
114 t/s
full fit · production speed tight fit · usabledoesn't fit
Get personalized ranking
Tell us your machine — we'll tell you if this is actually your best pick, or what's better.
Rank for my hardware