partial training of system #24

TimotheeMickus · 2023-09-29T07:33:25Z

freezing some of the modules would allow training adapter as actual adapters.

Ideally, this would entail introducing some mechanism to mark in the config specific layerstacks/adapaters as not requiring gradient.

To be confirmed, but we can probably just do a combination of the following to get the desired behavior:

leave marked models of all communication groups
not apply the forward has_grad hook to these models
remove them from gradient computations with module.requires_grad_(False)

The text was updated successfully, but these errors were encountered:

Waino · 2023-12-18T08:49:52Z

Basically you need to set param.requires_grad to false for the modules that should be frozen. If you do this in model_builder.py between creating the NMTModel and the call to create_adapters, then the former will be frozen and only the latter will be trained. In any case, you want to do this before registering the has_grad_hook, which happens a few lines later.

If empty communication groups are an issue (and they probably are), then you need to add a flag to TaskQueueManager.get_distributed_groups that makes sure that the keys encoder, decoder, src_emb, and tgt_emb are empty before returning my_distributed_groups. Either don't populate them, or clear them before returning.

It may also be a good idea to prevent some of the sub-optimizers from being created (in utils/optimizers.py attention_bridge_optimizer), but that may not even be necessary (I think the optimizers can handle being empty). You can try fist without this and implement it if needed.

TimotheeMickus added the enhancement New feature or request label Sep 29, 2023

TimotheeMickus mentioned this issue Oct 2, 2023

Feats/bucket lord #23

Merged

TimotheeMickus assigned Waino Nov 22, 2023

TimotheeMickus unassigned Waino Dec 18, 2023

TimotheeMickus added the good first issue Good for newcomers label Mar 1, 2024

TimotheeMickus mentioned this issue Mar 1, 2024

Prefix / prompt learning with mammoth #57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

partial training of system #24

partial training of system #24

TimotheeMickus commented Sep 29, 2023

Waino commented Dec 18, 2023 •

edited

Loading

partial training of system #24

partial training of system #24

Comments

TimotheeMickus commented Sep 29, 2023

Waino commented Dec 18, 2023 • edited Loading

Waino commented Dec 18, 2023 •

edited

Loading