Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about blocking an attention head #9

Open
cnlnpjhsy opened this issue Aug 5, 2024 · 0 comments
Open

Question about blocking an attention head #9

cnlnpjhsy opened this issue Aug 5, 2024 · 0 comments

Comments

@cnlnpjhsy
Copy link

## masking head in normal attention
if 'block_list' in kwargs:
for h in kwargs['block_list']:
if self.layer_idx==h[0]:
attn_weights[:, h[1], :, :] = 0

Hi there, I'm working on the interpretability of attention heads and your work is really inspiring. I'm not pro in modifications on these attention heads, so I'm a little confused about this part of the code.

If I want to block an attention head, I suppose that one should set the attention weight of that head to "-inf", just like what the attention mask do, but here the attention weight is set to zero. As the weights may contain negative values, could this zero-weight fully block the attention of that head? Hopefully I haven't missed something. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant