Matryoshka and quantization-aware

#7
by yitao416 - opened

Thanks for sharing your work! I think how to generate light weight vector is definitely the direction to go.

I see in the model card, the Matryoshka and quantization-aware are mentioned, but without any code examples. I just want to confirm the examples from sentence transformers are the correct to use them.

matryoshka_dim = 128
model = SentenceTransformer(
    "Snowflake/snowflake-arctic-embed-m-v2.0",
    trust_remote_code=True,
    truncate_dim=matryoshka_dim,
)

quantization

sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]
embeddings = model.encode(sentences, precision='int8')

Thanks!

Snowflake org

Please see our GitHub repo's code for examples of truncating and quantizing embedding vectors here: https://github.com/Snowflake-Labs/arctic-embed/blob/main/compressed_embeddings_examples/score_arctic_embed_m_v1dot5_with_quantization.ipynb

I have not tried performing quantization or truncation using the tools available in the sentence-transformers package, but if they give results consistent with the above notebook, you should be good to go! Similar dynamic ranges for the integer quantization used in that notebook for the 1.5 model will likely give similar quality on the 2.0 model, but we recommend you test for quality degradation on a benchmark of your choosing when using these compression options using an approach similar to that shown in the notebook.

Snowflake org

Also please note that the model is only trained for MRL dimension 256, and that int4 quantization is a much better way to compress to 128 bytes than truncating to 128 dim and using int8 quantization.

Snowflake org

The FAISS package also has a good implementation of fast 4-bit quantization using the method of Product Quantization (PQ), which may be both easier to use and higher quality due to fitting the quantization parameters to your data: https://github.com/facebookresearch/faiss/wiki/Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)

Great! Thanks for sharing!

Sign up or log in to comment