Finetuning model performance decreases when using memory_efficient_attenttion
#11
by
hrushikesh1
- opened
Hi,
I am fine tuning gte-large model for a dense retrieval(vector search) use case.
I noticed the recall & precision numbers go down significantly(5-6% points) when using memory_efficient_attenttion and unpadding.
The model card says:
- Setting attention dropout to 0 to use xformers and flash_attn.
Is it caused by the removal of attention dropout?
Is there any way to keep the model performance close while availing training speed up and memory savings of xformer ?