Getting an runtime error when trying to predict of mismached shapes

#484
by GalenP - opened

Hey there, Thank you so much for your contributions and support for Geneformer.
while following the CellXGene pipeline:https://chanzuckerberg.github.io/cellxgene-census/notebooks/analysis_demo/comp_bio_geneformer_prediction.html
I have loaded my data in a similar format and created the .dataset after tokenising it in the same format as the test data in the link when training getting a runtime error of shape mismatch:

Code:
from transformers import BertForSequenceClassification, Trainer
from geneformer import DataCollatorForCellClassification

reload pre-trained model

model = BertForSequenceClassification.from_pretrained(model_dir)

create the trainer

trainer = Trainer(model=model, data_collator=DataCollatorForCellClassification(token_dictionary=token_dict))

use trainer

predictions = trainer.predict(dataset)

Error:

RuntimeError Traceback (most recent call last)
/Geneformer/examples/cell_classification.ipynb Cell 23 line 8
6 trainer = Trainer(model=model, data_collator=DataCollatorForCellClassification(token_dictionary=token_dict))
7 # use trainer
----> 8 predictions = trainer.predict(dataset)

File /hpcfs/users/a1841503/myconda/envs/gene/lib/python3.10/site-packages/transformers/trainer.py:4153, in Trainer.predict(self, test_dataset, ignore_keys, metric_key_prefix)
4151 print(test_dataloader)
4152 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 4153 output = eval_loop(
4154 test_dataloader, description="Prediction", ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix
4155 )
4156 total_batch_size = self.args.eval_batch_size * self.args.world_size
4157 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:

File /hpcfs/users/a1841503/myconda/envs/gene/lib/python3.10/site-packages/transformers/trainer.py:4270, in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
4268 print(len(inputs))
4269 # Prediction step
-> 4270 losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
4271 main_input_name = getattr(self.model, "main_input_name", "input_ids")
4272 inputs_decode = (
4273 self._prepare_input(inputs[main_input_name]) if "inputs" in args.include_for_metrics else None
4274 )
...
-> 1073 buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
1074 token_type_ids = buffered_token_type_ids_expanded
1075 else:

RuntimeError: The expanded size of the tensor (2299) must match the existing size (2048) at non-singleton dimension 1. Target sizes: [8, 2299]. Tensor sizes: [1, 2048]

Also, I am getting a warning message for CUDA not sure if it's training on GPU or the CPU:
CUDA initialisation: The NVIDIA driver on your system is too old (found version 11060). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0

So I tokenised with the 30m token dictionary file and changed all the dictionary locations to the 30m model as it uses that.

Thank you for your question! It’s possible that the error is occurring because you have cells with a different number of genes being detected and they are not padded to be the same length for a given batch so they are causing this error while passing through the GPU. We would recommend checking this, and you may consider trying eval batch size 1, which should work if this is indeed the issue. Otherwise, we would recommend reaching out to CZI for support since you are using the code they provide.

ctheodoris changed discussion status to closed

Sign up or log in to comment