Skip to content

CUDA out of memory when saving checkpoint #1199

Answered by wangruohui
TonJ24 asked this question in How-to
Discussion options

You must be logged in to vote

Sorry for some late as we are working on some heavy development.

As a very simple workaround, you can just disable evaluation by setting interval of the evaluation larger than the total iter of training.

evaluation = dict(interval=5000, save_image=False, gpu_collect=True)

As checkpoint saving works anyway, you can make some analysis after the training or in an offline way by using another GPU.

Replies: 7 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by zengyh1900
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
How-to
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #911 on October 09, 2022 10:04.