Getting shape mismatch while loading saved Pixtral model
Hi, thank you for creating this transformers compatible version of Pixtral. I am saving the model to my local drive and then I want to load it again. However, I get size mismatch for the QKV matrices of "language_model" as shown below. I would appreciate some help. Thanks!
>>> from transformers import LlavaForConditionalGeneration
>>> model_id = "mistral-community/pixtral-12b"
>>> model = LlavaForConditionalGeneration.from_pretrained(model_id)
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6/6 [00:04<00:00, 1.27it/s]
>>> model.save_pretrained("pixtral-12b", from_pt = True)
>>> model2 = LlavaForConditionalGeneration.from_pretrained("pixtral-12b")
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 11/11 [00:02<00:00, 4.68it/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/sandbox/anaconda/envs/pixtral/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4224, in from_pretrained
) = cls._load_pretrained_model(
File "/data/sandbox/anaconda/envs/pixtral/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4852, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for LlavaForConditionalGeneration:
size mismatch for language_model.model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.0.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.1.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
......
Just updating that replacing the config.json works. Basically when I do save_pretrained
the config.json
that is saved is different from what is in this repo. Replacing it with the config.json
in this repo works. I am wondering why save_pretrained
doesnt download the correct config? Thanks.
Hey, thanks for reporting. This is related to the default values we have in Mistral config within transformers. Saving a config is not storing head_dim
and thus causing errors when loading it back. I will make an easy fix by updating the config for now
UPDATE: sorry, realized this cannot be fixed by just updating config and needs fix on transformers level. Will submit a PR soon