You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, I'm working on the interpretability of attention heads and your work is really inspiring. I'm not pro in modifications on these attention heads, so I'm a little confused about this part of the code.
If I want to block an attention head, I suppose that one should set the attention weight of that head to "-inf", just like what the attention mask do, but here the attention weight is set to zero. As the weights may contain negative values, could this zero-weight fully block the attention of that head? Hopefully I haven't missed something. Thank you!
The text was updated successfully, but these errors were encountered:
Retrieval_Head/faiss_attn/source/modeling_llama.py
Lines 685 to 689 in 3ac171a
Hi there, I'm working on the interpretability of attention heads and your work is really inspiring. I'm not pro in modifications on these attention heads, so I'm a little confused about this part of the code.
If I want to block an attention head, I suppose that one should set the attention weight of that head to "-inf", just like what the attention mask do, but here the attention weight is set to zero. As the weights may contain negative values, could this zero-weight fully block the attention of that head? Hopefully I haven't missed something. Thank you!
The text was updated successfully, but these errors were encountered: