-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Memory Issue #775
Comments
Please provide a YAML file to help us reproduce the issue, thanks. |
Thank you for your response, this is my YAML configuration:
This is the federated/llm/model/model_builder.py and adapter_builder.py that I modified to enable quantized LoRA:
By the way, the "save_to" attribute seems not able to really save the model when using adapter, the save_model function in federated/core/aggregators/aggregator.py only saves the weights:
probably it is better to use self.model.save_pretrained in this case since it saves the finetuned adapter weights and the configuration files. Thank you very much! |
sorry, just accidentally clicked the close issue button... |
I encountered the same problem. On the dolly dataset, each time a client trained, the video memory usage increased by about 2.5GB, resulting in my 24GB 4090 machine only being able to support about 8 clients training, and even unable to complete a round |
Algorithm 1 in the appendix of your article indicates that the client calculates the latest model by accumulating (seed-gradient) values at the beginning of each round of training. However, in the code you provided, the model update is performed by the server after the (seed-gradient) values of each client are collected. When each client is ready to train, server.model is deepcopied as a parameter to the client, and then the client executes "self.model = pulled_model self.model.to(self.device)". Is this the reason for the excessive use of video memory? Is the code version you provided incorrect? |
However, when I set the number of clients to 1 and the sampling rate to 1, the memory usage does not fluctuate much regardless of how many rounds of training I perform. |
Yes, especially for larger models. I've tried torch.cuda.empty_cache() and gc.collect(), it did help in a way but not really solved the problem. Like you mentioned, it might have something to do with the training process. Please let me know if you figure out how to fix it. Thanks! ^^ |
I even think this is related to pytorch’s memory release mechanism, or maybe it’s related to the machine.If you find a good solution, please share it, thank you. |
By chance, I ran the code on another 4-GPU 4090 machine, and I found that the video memory problem disappeared, the garbage video memory was recycled in time, and the model was trained normally. Therefore, this problem has nothing to do with the code, but may be related to hardware settings, driver versions, etc. Good luck. |
That's great. But I got this problem on 4090. Which driver version are you using? I'll have a try :) |
The gpu memory usage continues to increase after each round while finetuning LLM with an adapter. The gpu memory increment after each round was approximately the same. I speculate it's because that there are new clients joining in each round and there would be new model parameters. I've already set share_local_model and llm.adapter.mv_to_cpu to True, it should move the adapter to cpu after each round but why would the gpu memory still increase? I'd appreaciate it if anyone could help me with this issue. Thanks in advance!
The text was updated successfully, but these errors were encountered: