Question about the calculation of train_batch_size
#3876
-
In the doc https://www.deepspeed.ai/docs/config-json/#batch-size-related-parameters,
However, in pipeline mode, rightly it should be like this:
What is the difference between those? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @formath, great question 😸. In both cases the last term represents the degree of data parallelism used in training (since each data-parallel replica will have its own data pipeline contributing to the With pure data parallelism, that's simply the number of GPUs. If we are also training with pipeline (model) parallelism, the resulting degree of data parallelism is the number of GPUs divided by the number of pipeline stages. |
Beta Was this translation helpful? Give feedback.
Hi @formath, great question 😸. In both cases the last term represents the degree of data parallelism used in training (since each data-parallel replica will have its own data pipeline contributing to the
train_batch_size
.With pure data parallelism, that's simply the number of GPUs. If we are also training with pipeline (model) parallelism, the resulting degree of data parallelism is the number of GPUs divided by the number of pipeline stages.