[Rank value encoding] What kinds of normalization was done before calculating the non-zero median for genes?

#480

by myl200 - opened 2 days ago

Discussion

myl200

2 days ago

•

edited 2 days ago

Hello!
I have a question about your method, the rank value encoding. In your Nature 2023 paper, the description starts with "... we first calculated the non-zero median value of expression of each detected gene across all cells passing quality filtering from the entire Genecorpus-30M." I'm curious about what kind of normalization was done before this step. Based on "gene_median_dictionary_gc30M.pkl", all values are in the np.float64 format.

ctheodoris

Owner 1 day ago

Thanks for your question. The input is raw counts without normalization (see here). The raw counts are then transformed as described in the manuscript. You are also welcome to check out the code in the tokenizer script in this repository to see exactly how it is done.

ctheodoris changed discussion status to closed 1 day ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment