Why the pp_rank equals to torch.distributed.get_rank(group=self.optimizer.dp_process_group)? #4394
Unanswered
CrossEntropyZenk
asked this question in
Q&A
Replies: 1 comment
-
This pattern is commonly used in distributed training frameworks like PyTorch's Distributed Data Parallel (DDP) to determine the rank of a process within a specific group.the line of code you provided is a way to get the rank of the process within the optimizer's process group. This rank can then be used for various purposes in distributed training, such as deciding which part of the model to update or synchronizing operations within that group. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Beta Was this translation helpful? Give feedback.
All reactions