hidden_size % num_attention_heads != 0

#2
by mseeger - opened

This would have head_size=85, but then 85*12 = 1020 != 1024. Does this even work?

Sign up or log in to comment