You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Basically you need to set param.requires_grad to false for the modules that should be frozen. If you do this in model_builder.py between creating the NMTModel and the call to create_adapters, then the former will be frozen and only the latter will be trained. In any case, you want to do this before registering the has_grad_hook, which happens a few lines later.
If empty communication groups are an issue (and they probably are), then you need to add a flag to TaskQueueManager.get_distributed_groups that makes sure that the keys encoder, decoder, src_emb, and tgt_emb are empty before returning my_distributed_groups. Either don't populate them, or clear them before returning.
It may also be a good idea to prevent some of the sub-optimizers from being created (in utils/optimizers.pyattention_bridge_optimizer), but that may not even be necessary (I think the optimizers can handle being empty). You can try fist without this and implement it if needed.
freezing some of the modules would allow training adapter as actual adapters.
Ideally, this would entail introducing some mechanism to mark in the config specific layerstacks/adapaters as not requiring gradient.
To be confirmed, but we can probably just do a combination of the following to get the desired behavior:
has_grad
hook to these modelsmodule.requires_grad_(False)
The text was updated successfully, but these errors were encountered: