Finetuning model performance decreases when using memory_efficient_attenttion

#11

by hrushikesh1 - opened 4 days ago

4 days ago

Hi,

I am fine tuning gte-large model for a dense retrieval(vector search) use case.
I noticed the recall & precision numbers go down significantly(5-6% points) when using memory_efficient_attenttion and unpadding.
The model card says:

Setting attention dropout to 0 to use xformers and flash_attn.

Is it caused by the removal of attention dropout?
Is there any way to keep the model performance close while availing training speed up and memory savings of xformer ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment