You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am applying the DAS method to CLIP. When calculating the importance, the text model generates gradients well, but the vision model mostly produces NaN values. The units for calculating importance are placed in the self_attn and mlp of the CLIPEncoderLayer. The CLIPEncoderLayer is used identically for both the text and vision models.
I have declared the masks as follows:
class CLIPEncoder(nn.Module):
"""
Transformer encoder consisting of `config.num_hidden_layers` self attention layers. Each layer is a
[`CLIPEncoderLayer`].
Args:
config: CLIPConfig
"""
def __init__(self, config: CLIPConfig):
super().__init__()
self.config = config
self.layers = nn.ModuleList([CLIPEncoderLayer(config) for _ in range(config.num_hidden_layers)])
self.self_attn_mask = torch.ones(config.num_hidden_layers, config.hidden_size, dtype=torch.float16)
self.self_attn_mask.requires_grad_(True)
self.mlp_mask = torch.ones(config.num_hidden_layers, config.hidden_size, dtype=torch.float16)
self.mlp_mask.requires_grad_(True)
self.gradient_checkpointing = False
I have implemented it to operate in the CLIPEncoderLayer as follows:
Hello,
I am applying the DAS method to CLIP. When calculating the importance, the text model generates gradients well, but the vision model mostly produces NaN values. The units for calculating importance are placed in the self_attn and mlp of the CLIPEncoderLayer. The CLIPEncoderLayer is used identically for both the text and vision models.
I have declared the masks as follows:
I have implemented it to operate in the CLIPEncoderLayer as follows:
I would like to inquire if you have experienced the same phenomenon or if the implementation is incorrect.
Thank you.
The text was updated successfully, but these errors were encountered: