Skip to content
This repository has been archived by the owner on May 4, 2020. It is now read-only.

pytorch MPI raw speed #73

Open
martinjaggi opened this issue Sep 20, 2018 · 1 comment
Open

pytorch MPI raw speed #73

martinjaggi opened this issue Sep 20, 2018 · 1 comment
Labels

Comments

@martinjaggi
Copy link
Member

we found this benchmark here:
https://github.com/diux-dev/cluster/tree/master/pytorch_distributed_benchmark

will be interesting to have a look if we observe similar speed, and code is probably useful too.
note that their benchmark is only raw communication all-reduce, no learning. this is relevant if one is communication bound. so we might likely see this scenario when training linear models soon

@martinjaggi
Copy link
Member Author

BTW pytorch 1.0 has a new backend for distributed, called C10D which we should give a try. is used both in torch.distributed package and torch.nn.parallel.DistributedDataParallel

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant