Question about blocking an attention head #9

cnlnpjhsy · 2024-08-05T08:49:17Z

Retrieval_Head/faiss_attn/source/modeling_llama.py

Lines 685 to 689 in 3ac171a

    
           ## masking head in normal attention 
        
           if 'block_list' in kwargs: 
        
               for h in kwargs['block_list']: 
        
                   if self.layer_idx==h[0]:                    
        
                       attn_weights[:, h[1], :, :] = 0

Hi there, I'm working on the interpretability of attention heads and your work is really inspiring. I'm not pro in modifications on these attention heads, so I'm a little confused about this part of the code.

If I want to block an attention head, I suppose that one should set the attention weight of that head to "-inf", just like what the attention mask do, but here the attention weight is set to zero. As the weights may contain negative values, could this zero-weight fully block the attention of that head? Hopefully I haven't missed something. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about blocking an attention head #9

Question about blocking an attention head #9

cnlnpjhsy commented Aug 5, 2024

Question about blocking an attention head #9

Question about blocking an attention head #9

Comments

cnlnpjhsy commented Aug 5, 2024