You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Enabling regularization causes CUDNN_STATUS_MAPPING_ERROR for deepfm example (runs without problem without regularization). Also, using a keyword argument lambda to specify the regularization parameter causes a syntax error (though this can be avoided by using **{"lambda": 1e-3} as an argument).
To Reproduce
Steps to reproduce the behavior:
Follow the instructions for the DeepFM sample here
Add the keyword argument use_regularization=True to the hugectr.Layer_t.BinaryCrossEntropyLoss layer and run the code to generate CUDNN_STATUS_MAPPING_ERROR.
(just for syntax error) Specify the lambda regularization parameter and attempt to rerun.
Expected behavior
The model should train with regularization, and the keyword argument does not cause a syntax error.
Hi @klmentzer , Thanks for your trial. There is a bug when the regularizer is used together with solver.use_cuda_graph=True. We will fix the bug in the upcoming release. Could you please disable cuda graph as a WAR?
Is there any solution to this. I am getting the same issues, when trying run dlrm training v3.1 benchmarking with DGX H100. I have tried with next version v23.08.00 Nvidia-Merlin/HugeCTR like v23.09.00 and latest one too, but the same error persists. Can you please tell me how do we fix it. @JacoCheung
Describe the bug
Enabling regularization causes
CUDNN_STATUS_MAPPING_ERROR
for deepfm example (runs without problem without regularization). Also, using a keyword argumentlambda
to specify the regularization parameter causes a syntax error (though this can be avoided by using**{"lambda": 1e-3}
as an argument).To Reproduce
Steps to reproduce the behavior:
use_regularization=True
to thehugectr.Layer_t.BinaryCrossEntropyLoss
layer and run the code to generateCUDNN_STATUS_MAPPING_ERROR
.Expected behavior
The model should train with regularization, and the keyword argument does not cause a syntax error.
Screenshots
Environment (please complete the following information):
Thanks for your help!
The text was updated successfully, but these errors were encountered: