Model

  • Quantized Gemma 2 27B Instruction Tuned with IQ3_M
  • Fit a single T4 (16GB)

Usage (llama-cli with GPU):

llama-cli -m ./gemma-2-27b-it-IQ3_M.gguf -ngl 42 --temp 0 --repeat-penalty 1.0 --color -p "Why is the sky blue?"

Usage (llama-cli with CPU):

llama-cli -m ./gemma-2-27b-it-IQ3_M.gguf --temp 0 --repeat-penalty 1.0 --color -p "Why is the sky blue?"

Usage (llama-cpp-python via Hugging Face Hub):

from llama_cpp import Llama
llm = Llama.from_pretrained(
    repo_id="chenghenry/gemma-2-27b-it-GGUF ",
    filename="gemma-2-27b-it-IQ3_M.gguf",
    n_ctx=8192,
    n_batch=2048,
    n_gpu_layers=100,
    verbose=False,
    chat_format="gemma"
)
prompt = "Why is the sky blue?"
messages = [{"role": "user", "content": prompt}]
response = llm.create_chat_completion(
    messages=messages,
    repeat_penalty=1.0,
    temperature=0)
print(response["choices"][0]["message"]["content"])
Downloads last month
4
GGUF
Model size
27.2B params
Architecture
gemma2
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for chenghenry/gemma-2-27b-it-GGUF

Base model

google/gemma-2-27b
Quantized
(55)
this model