Fix checkpointable_layers
Logic
#6881
Merged
+62
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
There's an edge-case in DeepSpeed, where if all three of the following are true:
checkpointable_layers
(e.g. https://github.com/EleutherAI/gpt-neox/blob/f5325805678c2b9e35aae4528283e0132c5f5bbc/megatron/model/gpt2_model.py#L175)GPT2ModelPipe
or GPTModelPipe`Then the
checkpointable_layers
will not be activation checkpointed.Reason
This is because in the current logic,
_is_checkpointable
will short-circuit to just return layers matchingParallelTransformerLayerPipe
in the case ofself.__class__.__name__ in ('GPTModelPipe', 'GPT2ModelPipe')
. SeeDeepSpeed/deepspeed/runtime/pipe/module.py
Line 653 in da771ed
Proposed Fixes
I think that
checkpointable_layers
should always be checked for, and added logic to this effect. I also found the documentation forcheckpointable_layers
confusing and contradictory, so I updated the docstring. Lastly, I added a unit test forcheckpointable_layers
.