Gemma 2B Instruct GGUF

Contains Q4 & Q8 quantized GGUFs for google/gemma

Perf

Variant Device Perf
Q4 M1 Pro 10-core GPU 90 tok/s
Snapdragon 778G CPU 10 tok/s
RTX 2070S 40 tok/s
Q8 M1 Pro 10-core GPU 54 tok/s
Snapdragon 778G CPU 6 tok/s
RTX 2070S 25 tok/s
F16 M1 Pro 10-core GPU 30 tok/s
Snapdragon 778G CPU <1 tok/s
Downloads last month
36
GGUF
Model size
2.51B params
Architecture
gemma

4-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Collection including iAkashPaul/gemma-2b-it-gguf