Gemma 2B Instruct GGUF

Contains Q4 & Q8 quantized GGUFs for google/gemma

Perf

Variant	Device	Perf
Q4	M1 Pro 10-core GPU	90 tok/s
	Snapdragon 778G CPU	10 tok/s
	RTX 2070S	40 tok/s
Q8	M1 Pro 10-core GPU	54 tok/s
	Snapdragon 778G CPU	6 tok/s
	RTX 2070S	25 tok/s
F16	M1 Pro 10-core GPU	30 tok/s
	Snapdragon 778G CPU	<1 tok/s

Downloads last month: 36

GGUF

Model size

2.51B params

Architecture

gemma

4-bit

8-bit

16-bit

Inference Providers NEW

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Collection including iAkashPaul/gemma-2b-it-gguf

GGUFs

Collection

Collection of usable GGUFs for running LLMs on the edge or consumer devices like phones & laptops! • 3 items • Updated Mar 7, 2024 • 1