load a quantized model with transformers

#26
by Etienne83 - opened

I try to load the 4-bit quantized gguf model like this :

model_id = "MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF"
model_file = "Meta-Llama-3-70B-Instruct.Q4_K_S.gguf"

model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file = model_file)

The model is downloaded but I get this error :

ValueError: cannot reshape array of size 295501824 into shape (890,72)

These are GGUF models, not normal torch models. You must use Llama.cpp or any application/libraries that uses Llama.cpp under the hood. For instance:
image.png

MaziyarPanahi changed discussion status to closed

Sign up or log in to comment