Did something happen to the token_dictionary_gc95M.pkl?

#477
by ag2022 - opened

I've been playing with this a lot recently and due to our infrastructure I download/install geneformer fresh each time I load. the past few days i've had no problems loading the token_dictionary_gc95M file

when I try to extract embeddings, i get:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/Geneformer/geneformer/emb_extractor.py", line 521, in init
self.gene_token_dict = pickle.load(f)
_pickle.UnpicklingError: invalid load key, 'v'.

and if i try to load the .pkl directly i get
token_file = "../Geneformer/geneformer/token_dictionary_gc95M.pkl"
with open(token_file, "rb") as file:
token_dict = pickle.load(file)

Traceback (most recent call last):
File "", line 2, in
_pickle.UnpicklingError: invalid load key, 'v'.

am I the only one? using python 3.10.16.

Thanks for your question - this can happen when you aren't using git lfs.

ctheodoris changed discussion status to closed

yep totally right - our git lfs install had a problem on respin, once we fixed that everything's peachy again! tyty!

Sign up or log in to comment