Finetuning model performance decreases when using memory_efficient_attenttion

#11
by hrushikesh1 - opened

Hi,

I am fine tuning gte-large model for a dense retrieval(vector search) use case.
I noticed the recall & precision numbers go down significantly(5-6% points) when using memory_efficient_attenttion and unpadding.
The model card says:

  1. Setting attention dropout to 0 to use xformers and flash_attn.

Is it caused by the removal of attention dropout?
Is there any way to keep the model performance close while availing training speed up and memory savings of xformer ?

Sign up or log in to comment