Getting shape mismatch while loading saved Pixtral model

#24

by ss007 - opened 2 days ago

2 days ago

•

Hi, thank you for creating this transformers compatible version of Pixtral. I am saving the model to my local drive and then I want to load it again. However, I get size mismatch for the QKV matrices of "language_model" as shown below. I would appreciate some help. Thanks!

>>> from transformers import LlavaForConditionalGeneration
>>> model_id = "mistral-community/pixtral-12b" 
>>> model = LlavaForConditionalGeneration.from_pretrained(model_id)
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:04<00:00,  1.27it/s]
>>> model.save_pretrained("pixtral-12b", from_pt = True) 
>>> model2 = LlavaForConditionalGeneration.from_pretrained("pixtral-12b")
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:02<00:00,  4.68it/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/sandbox/anaconda/envs/pixtral/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4224, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/data/sandbox/anaconda/envs/pixtral/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4852, in _load_pretrained_model
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for LlavaForConditionalGeneration:
    size mismatch for language_model.model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
    size mismatch for language_model.model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
    size mismatch for language_model.model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
    size mismatch for language_model.model.layers.0.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
    size mismatch for language_model.model.layers.1.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
......

ss007 changed discussion title from Getting size mismatch while loading saved Pixtral model to Getting shape mismatch while loading saved Pixtral model 2 days ago

ss007

2 days ago

Just updating that replacing the config.json works. Basically when I do save_pretrained the config.json that is saved is different from what is in this repo. Replacing it with the config.json in this repo works. I am wondering why save_pretrained doesnt download the correct config? Thanks.

RaushanTurganbay

Unofficial Mistral Community org 1 day ago

•

edited 1 day ago

Hey, thanks for reporting. This is related to the default values we have in Mistral config within transformers. Saving a config is not storing head_dim and thus causing errors when loading it back. I will make an easy fix by updating the config for now

UPDATE: sorry, realized this cannot be fixed by just updating config and needs fix on transformers level. Will submit a PR soon

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment