-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Float16 CUDA conv
broken on 5D tensors
#505
Comments
Can you set |
julia> conv(rand(Float16, 16, 16, 16, 1, 1) |> gpu, rand(Float16, 3, 3, 3, 1, 1) |> gpu)
┌ Warning: No valid algorithm found, probably bad params for convolution.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:276
┌ Debug: cuBLAS (v11.8) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x23907a00)
│ version: type=int; val=POINTER (IN HEX:0x0x7ffcee3e1fbc)
│ Time: 2023-02-16T23:53:39 elapsed from start 0.883333 minutes or 53.000000 seconds
│ Process=2461157; Thread=22746666410368; GPU=0; Handle=POINTER (IN HEX:0x0x23907a00); StreamId=POINTER (IN HEX:0x0x4db8580); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/cublas/CUBLAS.jl:224
ERROR: ┌ Debug: cuBLAS (v11.8) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x23907a00)
│ version: type=int; val=POINTER (IN HEX:0x0x7ffcee3e1fbc)
│ Time: 2023-02-16T23:53:39 elapsed from start 0.883333 minutes or 53.000000 seconds
│ Process=2461157; Thread=22746666410368; GPU=0; Handle=POINTER (IN HEX:0x0x23907a00); StreamId=POINTER (IN HEX:0x0x4db8580); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/cublas/CUBLAS.jl:224
CUDNNError: ┌ Debug: cuBLAS (v11.8) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x23907a00)
│ version: type=int; val=POINTER (IN HEX:0x0x7ffcee3e1fbc)
│ Time: 2023-02-16T23:53:39 elapsed from start 0.883333 minutes or 53.000000 seconds
│ Process=2461157; Thread=22746666410368; GPU=0; Handle=POINTER (IN HEX:0x0x23907a00); StreamId=POINTER (IN HEX:0x0x4db8580); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/cublas/CUBLAS.jl:224
CUDNN_STATUS_NOT_SUPPORTED┌ Debug: cuBLAS (v11.8) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│ handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x23907a00)
│ version: type=int; val=POINTER (IN HEX:0x0x7ffcee3e1fbc)
│ Time: 2023-02-16T23:53:39 elapsed from start 0.883333 minutes or 53.000000 seconds
│ Process=2461157; Thread=22746666410368; GPU=0; Handle=POINTER (IN HEX:0x0x23907a00); StreamId=POINTER (IN HEX:0x0x4db8580); MathMode=CUBLAS_DEFAULT_MATH
│ COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
│
└ @ CUDA.CUBLAS /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/cublas/CUBLAS.jl:224
(code 9)
Stacktrace:
[1] throw_api_error(res::cuDNN.cudnnStatus_t)
@ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:11
[2] macro expansion
@ /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:24 [inlined]
[3] cudnnConvolutionForward(handle::Ptr{cuDNN.cudnnContext}, alpha::Base.RefValue{Float32}, xDesc::cuDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, wDesc::cuDNN.cudnnFilterDescriptor, w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, convDesc::cuDNN.cudnnConvolutionDescriptor, algo::cuDNN.cudnnConvolutionFwdAlgo_t, workSpace::CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, yDesc::cuDNN.cudnnTensorDescriptor, y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer})
@ cuDNN /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:26
[4] (::cuDNN.var"#1153#1155"{CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnActivationMode_t, cuDNN.cudnnConvolutionDescriptor, cuDNN.cudnnFilterDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnTensorDescriptor, Base.RefValue{Float32}, Base.RefValue{Float32}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnConvolutionFwdAlgoPerfStruct})(workspace::CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer})
@ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:105
[5] with_workspace(f::cuDNN.var"#1153#1155"{CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnActivationMode_t, cuDNN.cudnnConvolutionDescriptor, cuDNN.cudnnFilterDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnTensorDescriptor, Base.RefValue{Float32}, Base.RefValue{Float32}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnConvolutionFwdAlgoPerfStruct}, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing; keep::Bool)
@ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:77
[6] with_workspace(f::Function, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing)
@ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:56
[7] #with_workspace#1
@ /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53 [inlined]
[8] with_workspace(f::Function, size::UInt64, fallback::Nothing) (repeats 2 times)
@ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53
[9] cudnnConvolutionForwardAD(w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, bias::Nothing, z::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}; y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, activation::cuDNN.cudnnActivationMode_t, convDesc::cuDNN.cudnnConvolutionDescriptor, wDesc::cuDNN.cudnnFilterDescriptor, xDesc::cuDNN.cudnnTensorDescriptor, yDesc::cuDNN.cudnnTensorDescriptor, zDesc::cuDNN.cudnnTensorDescriptor, biasDesc::Nothing, alpha::Base.RefValue{Float32}, beta::Base.RefValue{Float32}, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any}, dready::Base.RefValue{Bool})
@ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:103
[10] cudnnConvolutionForwardWithDefaults(w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}; padding::Int64, stride::Int64, dilation::Int64, mode::cuDNN.cudnnConvolutionMode_t, mathType::cuDNN.cudnnMathType_t, reorderType::cuDNN.cudnnReorderType_t, group::Int64, format::cuDNN.cudnnTensorFormat_t, convDesc::cuDNN.cudnnConvolutionDescriptor, xDesc::cuDNN.cudnnTensorDescriptor, wDesc::cuDNN.cudnnFilterDescriptor, y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, yDesc::cuDNN.cudnnTensorDescriptor, alpha::Int64, beta::Int64, bias::Nothing, z::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, biasDesc::Nothing, zDesc::cuDNN.cudnnTensorDescriptor, activation::cuDNN.cudnnActivationMode_t, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any})
@ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:96
[11] #cudnnConvolutionForward!#1150
@ /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:53 [inlined]
[12] conv!(y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}; alpha::Int64, beta::Int64, algo::Int64)
@ NNlibCUDA /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:67
[13] conv!
@ /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:58 [inlined]
[14] #conv#233
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:88 [inlined]
[15] conv
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:83 [inlined]
[16] #conv#231
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:56 [inlined]
[17] conv(x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer})
@ NNlib /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:50
[18] top-level scope
@ REPL[3]:1
[19] top-level scope
@ /scratch/npj226/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155 |
There is a similar error for gradients with julia> w = rand(Float16, 3, 1, 1) |> gpu;
julia> gradient(x->sum(conv(x, w)), rand(Float16, 16, 1, 1) |> gpu)
┌ Warning: CuDNN (v8600) function cudnnGetConvolutionForwardAlgorithmMaxCount() called:
│ Info: Traceback contains 44 message(s)
│ Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: false == cudnn::cnn::isForwardSupported(handle, xDesc, wDesc, cDesc, yDesc, algo)
│ Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: T_ENGINEMAP::isLegacyAlgoSupported(handle, xDesc, wDesc, cDesc, yDesc, algo)
[...]
│ Time: 2023-02-16T23:53:39.684290 (0d+0h+0m+48s since start)
│ Process=2461157; Thread=2461157; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/cuDNN.jl:151
ERROR: CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
[1] throw_api_error(res::cuDNN.cudnnStatus_t)
@ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:11
[2] macro expansion
@ /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:24 [inlined]
[3] cudnnConvolutionBackwardFilter(handle::Ptr{cuDNN.cudnnContext}, alpha::Base.RefValue{Float32}, xDesc::cuDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, dyDesc::cuDNN.cudnnTensorDescriptor, dy::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, convDesc::cuDNN.cudnnConvolutionDescriptor, algo::cuDNN.cudnnConvolutionBwdFilterAlgo_t, workSpace::CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, dwDesc::cuDNN.cudnnFilterDescriptor, dw::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer})
@ cuDNN /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:26
[4] FluxML/NNlibCUDA.jl#36
@ /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:120 [inlined]
[5] with_workspace(f::NNlibCUDA.var"#36#38"{Base.RefValue{Float32}, Base.RefValue{Float32}, CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnConvolutionBwdFilterAlgoPerfStruct, cuDNN.cudnnFilterDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnConvolutionDescriptor}, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing; keep::Bool)
@ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:77
[6] with_workspace(f::Function, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing)
@ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:56
[7] #with_workspace#1
@ /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53 [inlined]
[8] with_workspace(f::Function, size::UInt64, fallback::Nothing) (repeats 2 times)
@ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53
[9] ∇conv_filter!(dw::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, dy::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{1, 1, 1, 2, 1}; alpha::Int64, beta::Int64, algo::Int64)
@ NNlibCUDA /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:119
[10] ∇conv_filter!
@ /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:107 [inlined]
[11] #∇conv_filter#237
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:112 [inlined]
[12] ∇conv_filter
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:107 [inlined]
[13] #375
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:351 [inlined]
[14] unthunk
@ /scratch/npj226/.julia/packages/ChainRulesCore/a4mIA/src/tangent_types/thunks.jl:204 [inlined]
[15] wrap_chainrules_output
@ /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:110 [inlined]
[16] map
@ ./tuple.jl:223 [inlined]
[17] map
@ ./tuple.jl:224 [inlined]
[18] wrap_chainrules_output
@ /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:111 [inlined]
[19] ZBack
@ /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:211 [inlined]
[20] Pullback
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:56 [inlined]
[21] (::typeof(∂(#conv#231)))(Δ::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer})
@ Zygote /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/interface2.jl:0
[22] Pullback
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:50 [inlined]
[23] Pullback
@ ./REPL[27]:1 [inlined]
[24] (::Zygote.var"#60#61"{typeof(∂(#30))})(Δ::Float16)
@ Zygote /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:45
[25] gradient(::Function, ::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, ::Vararg{CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}})
@ Zygote /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:97
[26] top-level scope
@ REPL[27]:1
[27] top-level scope
@ /scratch/npj226/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155 |
Ok, that means this might be easier to solve then if it's not dimension specific. I also forgot that cuDNN functionality had been spun off into its own package, sorry. Do you mind rerunning the test with |
Ok, julia> conv(rand(Float16, 16, 16, 16, 1, 1) |> gpu, rand(Float16, 3, 3, 3, 1, 1) |> gpu)
┌ Warning: No valid algorithm found, probably bad params for convolution.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:276
┌ Debug: CuDNN (v8600) function cudnnCreateConvolutionDescriptor() called:
│ convDesc: location=host; addr=0x1526f7168c80;
│ Time: 2023-02-17T00:25:16.053149 (0d+0h+0m+40s since start)
│ Process=2464303; Thread=2464303; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/cuDNN.jl:149
ERROR: CUDNNError: CUDNN_STATUS_NOT_SUPPORTED┌ Debug: CuDNN (v8600) function cudnnSetConvolutionNdDescriptor() called:
│ convDesc: location=host; addr=0x75faa90;
│ arrayLength: type=int; val=2;
│ padA: type=int; val=[0,0];
│ strideA: type=int; val=[1,1];
│ dilationA: type=int; val=[1,1];
│ mode: type=cudnnConvolutionMode_t; val=CUDNN_CONVOLUTION (0);
│ dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
│ Time: 2023-02-17T00:25:16.118505 (0d+0h+0m+40s since start)
│ Process=2464303; Thread=2464303; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/cuDNN.jl:149
(code 9)
Stacktrace:
[1] throw_api_error(res::cuDNN.cudnnStatus_t)
@ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:11
[2] macro expansion
@ /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:24 [inlined]
[3] cudnnConvolutionForward(handle::Ptr{cuDNN.cudnnContext}, alpha::Base.RefValue{Float32}, xDesc::cuDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, wDesc::cuDNN.cudnnFilterDescriptor, w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, convDesc::cuDNN.cudnnConvolutionDescriptor, algo::cuDNN.cudnnConvolutionFwdAlgo_t, workSpace::CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, yDesc::cuDNN.cudnnTensorDescriptor, y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer})
@ cuDNN /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:26
[4] (::cuDNN.var"#1153#1155"{CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnActivationMode_t, cuDNN.cudnnConvolutionDescriptor, cuDNN.cudnnFilterDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnTensorDescriptor, Base.RefValue{Float32}, Base.RefValue{Float32}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnConvolutionFwdAlgoPerfStruct})(workspace::CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer})
@ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:105
[5] with_workspace(f::cuDNN.var"#1153#1155"{CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnActivationMode_t, cuDNN.cudnnConvolutionDescriptor, cuDNN.cudnnFilterDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnTensorDescriptor, Base.RefValue{Float32}, Base.RefValue{Float32}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnConvolutionFwdAlgoPerfStruct}, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing; keep::Bool)
@ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:77
[6] with_workspace(f::Function, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing)
@ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:56
[7] #with_workspace#1
@ /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53 [inlined]
[8] with_workspace(f::Function, size::UInt64, fallback::Nothing) (repeats 2 times)
@ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53
[9] cudnnConvolutionForwardAD(w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, bias::Nothing, z::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}; y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, activation::cuDNN.cudnnActivationMode_t, convDesc::cuDNN.cudnnConvolutionDescriptor, wDesc::cuDNN.cudnnFilterDescriptor, xDesc::cuDNN.cudnnTensorDescriptor, yDesc::cuDNN.cudnnTensorDescriptor, zDesc::cuDNN.cudnnTensorDescriptor, biasDesc::Nothing, alpha::Base.RefValue{Float32}, beta::Base.RefValue{Float32}, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any}, dready::Base.RefValue{Bool})
@ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:103
[10] cudnnConvolutionForwardWithDefaults(w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}; padding::Int64, stride::Int64, dilation::Int64, mode::cuDNN.cudnnConvolutionMode_t, mathType::cuDNN.cudnnMathType_t, reorderType::cuDNN.cudnnReorderType_t, group::Int64, format::cuDNN.cudnnTensorFormat_t, convDesc::cuDNN.cudnnConvolutionDescriptor, xDesc::cuDNN.cudnnTensorDescriptor, wDesc::cuDNN.cudnnFilterDescriptor, y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, yDesc::cuDNN.cudnnTensorDescriptor, alpha::Int64, beta::Int64, bias::Nothing, z::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, biasDesc::Nothing, zDesc::cuDNN.cudnnTensorDescriptor, activation::cuDNN.cudnnActivationMode_t, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any})
@ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:96
[11] #cudnnConvolutionForward!#1150
@ /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:53 [inlined]
[12] conv!(y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}; alpha::Int64, beta::Int64, algo::Int64)
@ NNlibCUDA /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:67
[13] conv!
@ /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:58 [inlined]
[14] #conv#233
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:88 [inlined]
[15] conv
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:83 [inlined]
[16] #conv#231
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:56 [inlined]
[17] conv(x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer})
@ NNlib /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:50
[18] top-level scope
@ REPL[4]:1
[19] top-level scope
@ /scratch/npj226/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155
julia> w = rand(Float16, 3, 1, 1) |> gpu;
julia> gradient(x->sum(conv(x, w)), rand(Float16, 16, 1, 1) |> gpu)
┌ Debug: CuDNN (v8600) function cudnnSetConvolutionMathType() called:
│ convDesc: location=host; addr=0x75faa90;
│ mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
│ Time: 2023-02-17T00:25:16.118532 (0d+0h+0m+40s since start)
│ Process=2464303; Thread=2464303; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/cuDNN.jl:149
ERROR: ┌ Debug: CuDNN (v8600) function cudnnCreateTensorDescriptor() called:
│ Time: 2023-02-17T00:25:16.237211 (0d+0h+0m+40s since start)
│ Process=2464303; Thread=2464303; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/cuDNN.jl:149
CUDNNError: ┌ Debug: CuDNN (v8600) function cudnnSetTensorNdDescriptorEx() called:
│ format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NCHW (0);
│ dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
│ nbDims: type=int; val=4;
│ dimA: type=int; val=[1,1,16,16];
│ Time: 2023-02-17T00:25:16.252704 (0d+0h+0m+40s since start)
│ Process=2464303; Thread=2464303; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/cuDNN.jl:149
CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
[1] throw_api_error(res::cuDNN.cudnnStatus_t)
@ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:11
[2] macro expansion
@ /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:24 [inlined]
[3] cudnnConvolutionBackwardFilter(handle::Ptr{cuDNN.cudnnContext}, alpha::Base.RefValue{Float32}, xDesc::cuDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, dyDesc::cuDNN.cudnnTensorDescriptor, dy::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, convDesc::cuDNN.cudnnConvolutionDescriptor, algo::cuDNN.cudnnConvolutionBwdFilterAlgo_t, workSpace::CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, dwDesc::cuDNN.cudnnFilterDescriptor, dw::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer})
@ cuDNN /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:26
[4] FluxML/NNlibCUDA.jl#36
@ /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:120 [inlined]
[5] with_workspace(f::NNlibCUDA.var"#36#38"{Base.RefValue{Float32}, Base.RefValue{Float32}, CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnConvolutionBwdFilterAlgoPerfStruct, cuDNN.cudnnFilterDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnConvolutionDescriptor}, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing; keep::Bool)
@ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:77
[6] with_workspace(f::Function, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing)
@ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:56
[7] #with_workspace#1
@ /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53 [inlined]
[8] with_workspace(f::Function, size::UInt64, fallback::Nothing) (repeats 2 times)
@ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53
[9] ∇conv_filter!(dw::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, dy::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{1, 1, 1, 2, 1}; alpha::Int64, beta::Int64, algo::Int64)
@ NNlibCUDA /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:119
[10] ∇conv_filter!
@ /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:107 [inlined]
[11] #∇conv_filter#237
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:112 [inlined]
[12] ∇conv_filter
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:107 [inlined]
[13] #375
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:351 [inlined]
[14] unthunk
@ /scratch/npj226/.julia/packages/ChainRulesCore/a4mIA/src/tangent_types/thunks.jl:204 [inlined]
[15] wrap_chainrules_output
@ /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:110 [inlined]
[16] map
@ ./tuple.jl:223 [inlined]
[17] map
@ ./tuple.jl:224 [inlined]
[18] wrap_chainrules_output
@ /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:111 [inlined]
[19] ZBack
@ /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:211 [inlined]
[20] Pullback
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:56 [inlined]
[21] (::typeof(∂(#conv#231)))(Δ::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer})
@ Zygote /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/interface2.jl:0
[22] Pullback
@ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:50 [inlined]
[23] Pullback
@ ./REPL[6]:1 [inlined]
[24] (::Zygote.var"#60#61"{typeof(∂(#3))})(Δ::Float16)
@ Zygote /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:45
[25] gradient(f::Function, args::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer})
@ Zygote /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:97
[26] top-level scope
@ REPL[6]:1
[27] top-level scope
@ /scratch/npj226/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155 |
I've been looking into this but haven't found anything conclusive yet. Can you test with NNlibCUDA v0.2.6 and see if it has the same issue? Verifying whether it's a CUDA lib version issue should help us narrow down the possibilities significantly. Edit: just tested myself and same issue. This is strange, because when I log all the descriptors everything looks fine, but for whatever reason the algo search at https://github.com/JuliaGPU/CUDA.jl/blob/a70c83e2cbe978873a7aa74f2493838b509aa42c/lib/cudnn/src/convolution.jl#L193 is returning Edit2: right after I posted the last edit, I realized that Table 30 under https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnConvolutionForward notes that 3D convs only support P.S. @mcabbott you may be interested in Tables like no. 25 and 26 in https://docs.nvidia.com/deeplearning/cudnn/api/index.html. We were wondering what mixtures of datatypes people might use in the wild and I think such tables provide a pretty exhaustive list. |
@ToucheSir I'm back to being able to help (busy semester). Do you still want a test with NNlibCUDA v0.2.6? |
No, per the edits in the above post I think I've reproduced it. Re-reading the CUDA.jl -> NNlibCUDA integration code, I think https://github.com/FluxML/NNlibCUDA.jl/blob/82ba6cb4ef6c6ed11d93c6bd7e72a8eb3cb2234a/src/cudnn/conv.jl#L46-L56would nave to be special-cased for 3D convs + Float16 inputs. Two main driving questions there: is it fine to do this silently without warning users or letting them opt for an error, and what is the least tedious way to do this (I don't want to hard-code all the valid configurations in Tables 26-30 unless absolutely necessary)? |
Float16 CUDA
conv
seems to be broken for 5D tensors, but not 3D or 4D tensors. FluxML/Flux.jl#2184(using Julia 1.8.3 on a A100 GPU.)
The text was updated successfully, but these errors were encountered: