Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partial training of system #24

Open
TimotheeMickus opened this issue Sep 29, 2023 · 1 comment
Open

partial training of system #24

TimotheeMickus opened this issue Sep 29, 2023 · 1 comment
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@TimotheeMickus
Copy link
Collaborator

freezing some of the modules would allow training adapter as actual adapters.

Ideally, this would entail introducing some mechanism to mark in the config specific layerstacks/adapaters as not requiring gradient.

To be confirmed, but we can probably just do a combination of the following to get the desired behavior:

  • leave marked models of all communication groups
  • not apply the forward has_grad hook to these models
  • remove them from gradient computations with module.requires_grad_(False)
@TimotheeMickus TimotheeMickus added the enhancement New feature or request label Sep 29, 2023
@Waino
Copy link
Collaborator

Waino commented Dec 18, 2023

Basically you need to set param.requires_grad to false for the modules that should be frozen. If you do this in model_builder.py between creating the NMTModel and the call to create_adapters, then the former will be frozen and only the latter will be trained. In any case, you want to do this before registering the has_grad_hook, which happens a few lines later.

If empty communication groups are an issue (and they probably are), then you need to add a flag to TaskQueueManager.get_distributed_groups that makes sure that the keys encoder, decoder, src_emb, and tgt_emb are empty before returning my_distributed_groups. Either don't populate them, or clear them before returning.

It may also be a good idea to prevent some of the sub-optimizers from being created (in utils/optimizers.py attention_bridge_optimizer), but that may not even be necessary (I think the optimizers can handle being empty). You can try fist without this and implement it if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants