-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get help for distributed model training on MI250 #30
Comments
Hey Zifan- I'm not personally very familiar with all the different options for model-parallel distributed training, but you might check the ROCm blogs since they have a lot of LLM examples on AMD GPUs: https://rocm.blogs.amd.com/blog/category/applications-models.html -Tom |
Hi Tom, Thank you. I found what I need on the ROCm blog and will get back to you if that doesn't work. Zifan |
Hey @OswaldHe- I know you said you found what you needed, but I figured it wouldn't hurt to share this as well... Here is a blog post that describes how AMD trained a small-language model (SLM) on AMD GPUs w/ distributed training: https://www.amd.com/en/developer/resources/technical-articles/introducing-amd-first-slm-135m-model-fuels-ai-advancements.html At the bottom, in the Call to Actions section, there is a link to the GitHub where you can reproduce the model yourself. I don't think you necessarily want to do that, but it should provide an example of using PyTorch FSDP for multi-node distributed training. -Tom |
Hi @OswaldHe |
Hi @Alexis-BX I just installed huggingface accelerate and deepspeed in a conda environment: https://huggingface.co/docs/accelerate/en/usage_guides/deepspeed. |
Hi |
Hi,
I would like to test a program for distributed LLM model training on mi2508x and I want to do model parallel to distribute parameters across GPUs. Is there any framework that I should use to achieve that? I used DeepSpeed (https://github.com/microsoft/DeepSpeed), but their ZeRO stage-3 will actually increase memory consumption of all GPUs compared with ZeRO stage-2, which only do optimizer distribution. Is there any resource/recommendation and some examples specifically for AMD GPUs?
Thank you,
Zifan
The text was updated successfully, but these errors were encountered: