Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting a "!!Error occurred: CUFFT plan creation failed" #27

Open
walidabualafia opened this issue Sep 16, 2024 · 25 comments
Open

Getting a "!!Error occurred: CUFFT plan creation failed" #27

walidabualafia opened this issue Sep 16, 2024 · 25 comments

Comments

@walidabualafia
Copy link

Hi all,

Thank you so much for your work on this package.

I just pulled the package and installed, and I am trying to run the following line, but it keeps running into an error:

cudasirecon   --input-file ~/simtest/src/ --output-file simtest --otf-file ~/simtest/psf/488_xscan_SIMPSF__total_PSF_V2Hex.tif --nphases 5 --ndirs 1 --bessel --zoomfact 1.5 --ls .8 --na 1.0 --nimm 1.33 --angle0 -1.57 --otfRA --wiener 0.001 --background 100 --xyres 0.108 --zres 0.3 --besselNA 0.45 --deskew 32.8 --besselExWave .488 --gammaApo .001  

Whenever I run this line, execution halts when the code gets to cuFFT section. When my teammate traced it with GDB, and found that there is a segmentation fault.

The error that gets written to stdout is:

...
zdistcutoff[1]=147
zdistcutoff[2]=147
moving centerband
Before fftplan3d 56059MB free
Error code: 700
ptr_: 23422402822144
Error code: 700
ptr_: 23427368878080
Error code: 700
ptr_: 23401632628736
Error code: 700
ptr_: 23399418036224
Error code: 700
ptr_: 23397203443712
Error code: 700
ptr_: 23394988851200
Error code: 700
ptr_: 23392774258688
Error code: 700
ptr_: 23437960544256
Error code: 700
ptr_: 23437960806400
Error code: 700
ptr_: 23437959757824
 
!!Error occurred: CUFFT plan creation failed

Has anyone seen this error? Could the code be trying to access illegal memory regions?

Any help is appreciated! :)

Thank you,
Walid

@tlambert03
Copy link
Member

How big is your image volume and how much GPU ram do you have available?

@walidabualafia
Copy link
Author

My --input-file directory is 7.7G and my --otf-file is 43M.

I have 80GB VRAM (running on A100).

Thank you!

@tlambert03
Copy link
Member

oh ok, should definitely be more than sufficient.
This sort of thing can be pretty hard to debug unfortunately. Could you try reconstructing the test data in https://github.com/scopetools/cudasirecon/tree/main/test_data (see config files in the same directory) just to ensure that the package itself is installed and working ok? If so, we can try to determine what might be different about the data you're reconstructing

@walidabualafia
Copy link
Author

Thanks, @tlambert03. We can confirm that the test data is working ok.

The problem might be with the data construction. I will talk to the data owners to see what we can share about the data reconstruction.

Thank you! :)

@dan-alford
Copy link

dan-alford commented Sep 30, 2024

I am @walidabualafia colleague. The test data did run successfully but the resultant image does not look correct.
Left is the raw image from the test data set, right is the processed image

Screenshot 2024-09-30 at 1 43 57 PM

This was ran using the following command

cudasirecon --config cudasirecon/test_data/config-tiff --otf-file cudasirecon/test_data/otf.tif --input-file cudasirecon/test_data/ --output-file raw

Cudasirecon was installed via conda.

@linshaova
Copy link
Collaborator

Hi dan,

Did you scroll to Z slices 4 or 5 and see if it makes more sense? Slice #1 usually doesn't show anything useful because it's out of focus, especially when the contrast is stretched between min and max intensities.

-lin

@dan-alford
Copy link

@linshaova Here are screenshots from 4 & 5
Screenshot 2024-09-30 at 3 19 49 PM
Screenshot 2024-09-30 at 3 20 03 PM

@linshaova
Copy link
Collaborator

I could duplicate what you got, @dan-alford.

@tlambert03, sorry I never tested this test_data before. In cudarecon's log printout, the "modamp" numbers are suspiciously large (~8 for order 2, and ~18 for order 1 except for dir 0). See attached log. Has anything changed?

-lin
cudasirecon_log.txt

@linshaova
Copy link
Collaborator

Ah, I know what's going on now. You should change the line fastSI=0 from the `config-tiff`` file. This option is referring to how a 3D SIM stack data is organized: fastSI=1 means taking all 15 images at one Z slice before moving to the next slice, whereas fastSI=0 means taking one Z stack for one SIM orientation and the next orientation. The test data is organized in the latter way, and therefore that flag should be set to 0.

@tlambert03 could you make that change in the git? thanks!

@tlambert03
Copy link
Member

tlambert03 commented Oct 2, 2024

Argh, will have to check the git history and repeat locally

@linshaova
Copy link
Collaborator

See my last comment just as your comment came out. I just created a PR for the updated config-tiff.

@tlambert03
Copy link
Member

truly baffled as to how that was wrong. but don't have time at the moment to sleuth it out at the moment. you can confirm that switching to fastSI=0 gives the expected output?

@linshaova
Copy link
Collaborator

Yes (sorry I forgot to mention that!)

@dan-alford
Copy link

That solved one of our issues. We are seeing issues when we run a reconstruction on a known good file that works with an older version of cudasirecon (cuda9) but is showing issues on the new code.
The image on the left is the src image, middle image was converted using older version of cudasirecon, and right is the new version.
Screenshot 2024-10-02 at 4 52 06 PM

The older version was run using Omero with the following properties:

helper.set_zoom(1.5)
helper.set_zres(0.2)
helper.set_nimm(1.33)
helper.set_deskew(32.8)
helper.add_gamma(0.7)
helper.add_ls(0.5015)
helper.add_bex(0.488)
helper.add_wiener(0.001)
helper.add_background(150)
helper.set_bessel_na(0.511)
helper.setOtf('488OTFTHEORY.tif")

This was run on the new version with the following parameters:

cudasirecon --input-file ~/unit/ --output-file unit --nphases 5 --ndirs 1 --bessel --zoomfact 1.5 --ls 0.5015 --nimm 1.33 --wiener 0.001 --background 150 --zres 0.2 --besselNA 0.511 --deskew 32.8 --besselExWave 0.488 --gammaApo .07 --otf-file 488OTFTHEORY.tif

Is there a parameter that has changed or needs to be added?

@linshaova
Copy link
Collaborator

  1. --ndirs 1 is suspicious
  2. Is there no need for --k0angles?

@tlambert03
Copy link
Member

looks like lattice SIM data probably right?

sorry to hear there's been a breaking change in there @dan-alford, that wasn't the intention of course. Can you help narrow down exactly what the older version was? can you run cudasirecon --version?

@linshaova
Copy link
Collaborator

linshaova commented Oct 3, 2024

Hi @dan-alford , when you say "old version", how old are you talking about? If it was really old (~10 years), then there was a major change in how an OTF TIFF file is organized (related to how complex numbers are represented) since about 8 years ago. If you were using OTF created long time ago, then it may be the reason it doesn't work with the latest version. In that case, you would need to re-generate the OTF file.

@dan-alford
Copy link

the --version flag was not recognized. the --help gave the following.

--input-file arg input file (or data folder in TIFF mode)
--output-file arg output file (or filename pattern in TIFF
mode)
--otf-file arg OTF file
--ndirs arg (=3) number of directions
--nphases arg (=5) number of phases per direction
--nordersout arg (=0) number of output orders; must be <= norders
--angle0 arg (=1.648) angle of the first direction in radians
--ls arg (=0.172000006) line spacing of SIM pattern in microns
--na arg (=1.20000005) Detection numerical aperture
--nimm arg (=1.33000004) refractive index of immersion medium
--zoomfact arg (=2) lateral zoom factor
--explodefact arg (=1) artificially exploding the reciprocal-space
distance between orders by this factor
--zzoom arg (=1) axial zoom factor
--nofilteroverlaps [=arg(=0)] do not filter the overlaping region between
bands usually used in trouble shooting
--background arg (=0) camera readout background
--wiener arg (=0.00999999978) Wiener constant
--wienerInr arg (=0.00999999978) Wiener constant increment
--forcemodamp arg modamps forced to these values
--k0angles arg user given pattern vector k0 angles for all
directions
--otfRA [=arg(=1)] using rotationally averaged OTF
--k0searchAll [=arg(=0)] search for k0 at all time points
--equalizez [=arg(=1)] bleach correcting for z
--equalizet [=arg(=1)] bleach correcting for time
--dampenOrder0 [=arg(=1)] dampen order-0 in final assembly
--nosuppress [=arg(=0)] do not suppress DC singularity in final
assembly (good idea for 2D/TIRF data)
--nokz0 [=arg(=1)] do not use kz=0 plane of the 0th order in
the final assembly
--gammaApo arg (=1) output apodization gamma; 1.0 means
triangular apo
--saveprefiltered arg save separated bands (half Fourier space)
into a file and exit
--savealignedraw arg save drift-fixed raw data (half Fourier
space) into a file and exit
--saveoverlaps arg save overlap0 and overlap1 (real-space
complex data) into a file and exit
-c [ --config ] arg name of a file of a configuration.
--2lenses [=arg(=1)] I5S data
--bessel [=arg(=1)] bessel-SIM data
--besselExWave arg (=0.488000005) Bessel SIM excitation wavelength in microns
--besselNA arg (=0.143999994) Bessel SIM excitation NA)
--deskew arg (=0) Deskew angle; if not 0.0 then perform
deskewing before processing
--deskewshift arg (=0) If deskewed, the output image's extra shift
in X (positive->left)
--noRecon No reconstruction will be performed; useful
when combined with --deskew
--cropXY arg (=0) Crop the X-Y dimension to this number; 0
means no cropping
--xyres arg (=0.100000001) x-y pixel size (only used for TIFF files)
--zres arg (=0.143999994) z pixel size (only used for TIFF files)
--wavelength arg (=530) emission wavelength (only used for TIFF
files)
-h [ --help ] produce help message

@tlambert03
Copy link
Member

tlambert03 commented Oct 3, 2024

thanks that helps. (it's newer than may 2020 #4 but older than june 2021 #11)

@linshaova
Copy link
Collaborator

At least in the past, even in LLSM-SIM mode --k0angles is needed, such as --k0angles 1.57.

Could you share the printout cudasirecon produced when processing the above dataset you showed?

@dan-alford
Copy link

@linshaova - which version ? the old or the new? or both?

@linshaova
Copy link
Collaborator

both please (with the command line included)

@dan-alford
Copy link

new version

omero@nodegpu234 ~]$ cudasirecon --input-file ~/unit/ --output-file unit --otf-file 488OTFTHEORY.tif --ls 0.501500 --ndirs 1 --bessel --zoomfact 1.500000 --na 1.100000 --nimm 1.330000 --angle0 -1.570000 --otfRA --wiener 0
.001000 --background 150.000000 --xyres 0.104000 --zres 0.200000 --besselNA 0.511000 --besselExWave 0.488000 --gammaApo 0.700000 --deskew 32.800000 --nphases 5
wiener=0.001
gamma = 0.7
nphases=5, ndirs=1
nx_raw=512, ny=512, nz=85
nx=648, ny=512, nz=85, nz0 = 85, nwaves=1
dxy=0.104000, dz=0.108342 um
nphases=5, norders=3, ndirs=1
nzotf=32, dkzotf=0.208333, nxotf=33, nyotf=1, dkrotf=0.150240
In makematrix.
Separation matrix:
1.00000 1.00000 1.00000 1.00000 1.00000
1.00000 0.30902 -0.80902 -0.80902 0.30902
0.00000 0.95106 0.58779 -0.58779 -0.95106
1.00000 -0.80902 0.30902 0.30902 -0.80902
0.00000 0.58779 -0.95106 0.95106 -0.58779

k0guess[direction 0] = (0.053502, -53.088722) pixels
Initial guess by findk0() of k0[direction 0] = (-0.042217,0.116336) pixels
before fitk0andmodamp
In getmodamp: angle=1.850034, mag=0.002273, amp=0.027115, phase=-2.773549
In getmodamp: angle=1.851034, mag=0.002273, amp=0.027115, phase=-2.773578
In getmodamp: angle=1.849033, mag=0.002273, amp=0.027115, phase=-2.773520
In getmodamp: angle=1.848033, mag=0.002273, amp=0.027115, phase=-2.773492
In getmodamp: angle=1.847033, mag=0.002273, amp=0.027115, phase=-2.773463
In getmodamp: angle=1.846033, mag=0.002273, amp=0.027115, phase=-2.773433
In getmodamp: angle=1.845033, mag=0.002273, amp=0.027115, phase=-2.773405
In getmodamp: angle=1.844033, mag=0.002273, amp=0.027115, phase=-2.773375
In getmodamp: angle=1.843033, mag=0.002273, amp=0.027115, phase=-2.773346
In getmodamp: angle=1.842033, mag=0.002273, amp=0.027115, phase=-2.773316
In getmodamp: angle=1.841033, mag=0.002273, amp=0.027115, phase=-2.773286
In getmodamp: angle=1.840033, mag=0.002273, amp=0.027115, phase=-2.773257
In getmodamp: angle=1.839033, mag=0.002273, amp=0.027115, phase=-2.773227
In getmodamp: angle=1.838033, mag=0.002273, amp=0.027115, phase=-2.773197
In getmodamp: angle=1.837033, mag=0.002273, amp=0.027115, phase=-2.773166
In getmodamp: angle=1.836033, mag=0.002273, amp=0.027115, phase=-2.773136
In getmodamp: angle=1.835033, mag=0.002273, amp=0.027115, phase=-2.773106
In getmodamp: angle=1.834033, mag=0.002273, amp=0.027115, phase=-2.773076
In getmodamp: angle=1.833033, mag=0.002273, amp=0.027115, phase=-2.773045
In getmodamp: angle=1.832033, mag=0.002273, amp=0.027115, phase=-2.773015
In getmodamp: angle=1.831033, mag=0.002273, amp=0.027115, phase=-2.772984
In getmodamp: angle=1.830033, mag=0.002273, amp=0.027115, phase=-2.772953
In getmodamp: angle=1.829033, mag=0.002273, amp=0.027115, phase=-2.772922
In getmodamp: angle=1.828032, mag=0.002273, amp=0.027115, phase=-2.772891
In getmodamp: angle=1.827032, mag=0.002273, amp=0.027115, phase=-2.772860
In getmodamp: angle=1.826032, mag=0.002273, amp=0.027115, phase=-2.772828
In getmodamp: angle=1.825032, mag=0.002273, amp=0.027116, phase=-2.772797
In getmodamp: angle=1.824032, mag=0.002273, amp=0.027116, phase=-2.772766
In getmodamp: angle=1.823032, mag=0.002273, amp=0.027116, phase=-2.772734
In getmodamp: angle=1.822032, mag=0.002273, amp=0.027116, phase=-2.772702
In getmodamp: angle=1.821032, mag=0.002273, amp=0.027116, phase=-2.772671
In getmodamp: angle=1.820032, mag=0.002273, amp=0.027116, phase=-2.772638
In getmodamp: angle=1.819032, mag=0.002273, amp=0.027116, phase=-2.772606
In getmodamp: angle=1.818032, mag=0.002273, amp=0.027116, phase=-2.772574
In getmodamp: angle=1.817032, mag=0.002273, amp=0.027116, phase=-2.772542
In getmodamp: angle=1.816032, mag=0.002273, amp=0.027116, phase=-2.772509
In getmodamp: angle=1.815032, mag=0.002273, amp=0.027116, phase=-2.772477
In getmodamp: angle=1.814032, mag=0.002273, amp=0.027116, phase=-2.772444
In getmodamp: angle=1.815199, mag=0.002273, amp=0.027116, phase=-2.772482
In getmodamp: angle=1.815199, mag=0.003757, amp=0.027140, phase=-2.843364
In getmodamp: angle=1.815199, mag=0.005241, amp=0.026993, phase=-2.912066
Optimum modulation amplitude:
In getmodamp: angle=1.815199, mag=0.003225, amp=0.027151, phase=-2.818154
Reverse modamp is: amp=1.222096, phase=-2.818154
Combined modamp is: amp=0.028061, phase=-2.818154
Correlation coefficient is: 0.149053
Optimum k0 angle=1.815199, length=0.003225, spacing=310.092732 um
In getmodamp: angle=1.815199, mag=0.003225, amp=0.004585, phase=-2.745981
Reverse modamp is: amp=0.752656, phase=-2.745981
Combined modamp is: amp=0.004601, phase=-2.745981
Correlation coefficient is: 0.078054
WARNING: best fit for k0 is 50.157% from expected value.
norders=3, zdistcutoff[0]=29
zdistcutoff[1]=30
zdistcutoff[2]=30
moving centerband
Before fftplan3d 79523MB free
After fftplan 79037MB free
re-transforming centerband
inserting centerband
centerband assembly completed
moving order 1
order 1 sideband assembly completed
moving order 2
order 2 sideband assembly completed
Output: /home/omero/unit/GPUsirecon/unit_proc.tif
amin, amax took: 3.947984 s
Time point 0, wave 0 done

Old Version

Calling Sim Reconstruction with

cudaSireconDriver --input-file "/tmp/sld_temp_adieowvow_81912903/" --output-file "image_0_Obj_Scan_Single_Channel_Ch1_-_3_P0_T0000_C00.ome" --otf-file "/research/applications/omero/omero_production/ManagedRepository/PSFs/sking2_4/2021-04/01/14-48-06.260/488OTFTHEORY.tif"  --ls 0.501500 --ndirs 1 --bessel --zoomfact 1.500000 --na 1.100000 --nimm 1.330000 --angle0 -1.570000 --otfRA --wiener 0.001000 --background 150.000000 --xyres 0.104000 --zres 0.200000 --besselNA 0.511000 --besselExWave 0.488000 --gammaApo 0.700000 --deskew 32.800000 --nphases 5

wiener=0.001
gamma=0.7
nphases=5, ndirs=1
nx=512, ny=512, nz=85, nz0 = 85, nwaves=1, ntimes=1
nzotf=64, dkzotf=0.156250, nxotf=33, nyotf=1, dkrotf=0.135281
In makematrix.
Separation matrix:
1.00000 1.00000 1.00000 1.00000 1.00000
1.00000 0.30902 -0.80902 -0.80902 0.30902
0.00000 0.95106 0.58779 -0.58779 -0.95106
1.00000 -0.80902 0.30902 0.30902 -0.80902
0.00000 0.58779 -0.95106 0.95106 -0.58779

deskew_GPU(): no error
intensity_overall=1.117126e-08
****** total memory 11178M; free memory 10597M
k0guess[direction 0] = (0.042273, -53.088718)
krscale=0.138822 kzscale=0.703243
order2=1, rdistcutoff=221, zdistcutoff=29.000000
makeoverlaps() line 596:no error
makeoverlaps() line 662:no error
****** total memory 11178M; free memory 10235M
makeoverlaps() line 670:no error
makeoverlaps() line 677:no error
Initial guess by findk0() of k0[direction 0] = (-0.101093,0.080322)
before fitk0andmodamp
krscale=0.138822 kzscale=0.703243
order2=1, rdistcutoff=221, zdistcutoff=30.000000
makeoverlaps() line 596:no error
makeoverlaps() line 662:no error
****** total memory 11178M; free memory 10235M
makeoverlaps() line 670:no error
makeoverlaps() line 677:no error
In getmodamp: angle=2.470194, mag=0.129118, amp=0.019913, phase=-2.722648
In getmodamp: angle=2.471194, mag=0.129118, amp=0.019913, phase=-2.722605
In getmodamp: angle=2.469194, mag=0.129118, amp=0.019913, phase=-2.722691
In getmodamp: angle=2.468194, mag=0.129118, amp=0.019913, phase=-2.722734
In getmodamp: angle=2.468791, mag=0.129118, amp=0.019913, phase=-2.722708
In getmodamp: angle=2.468791, mag=0.229118, amp=0.019939, phase=-2.762047
In getmodamp: angle=2.468791, mag=0.329118, amp=0.019948, phase=-2.801625
In getmodamp: angle=2.468791, mag=0.429118, amp=0.019940, phase=-2.841616
Optimum modulation amplitude:
In getmodamp: angle=2.468791, mag=0.330796, amp=0.019948, phase=-2.802293
Reverse modamp is: amp=1.275741, phase=-2.802293
Combined modamp is: amp=0.020460, phase=-2.802293
Correlation coefficient is: 0.125044
Optimum k0 angle=2.468791, length=0.330796, spacing=160.969199 microns
krscale=0.138822 kzscale=0.703243
order2=2, rdistcutoff=221, zdistcutoff=30.000000
makeoverlaps() line 596:no error
makeoverlaps() line 662:no error
****** total memory 11178M; free memory 10235M
makeoverlaps() line 670:no error
makeoverlaps() line 677:no error
In getmodamp: angle=2.468791, mag=0.330796, amp=0.003143, phase=-2.781246
Reverse modamp is: amp=0.776559, phase=-2.781246
Combined modamp is: amp=0.003150, phase=-2.781246
Correlation coefficient is: 0.063614
WARNING: best fit for k0 is 53.295715 pixels from expected value.
norders=3, zdistcutoff[0]=29
zdistcutoff[1]=30
zdistcutoff[2]=30
moving centerband
assemblerealspacebands() line 1863:no error
assemblerealspacebands() line 1869:no error
re-transforming centerband
inserting centerband
assemblerealspacebands() line 1897:no error
centerband assembly completed
moving order 1
assemblerealspacebands() line 1910:no error
assemblerealspacebands() line 1928:no error
assemblerealspacebands() line 1933:no error
order 1 sideband assembly completed
moving order 2
assemblerealspacebands() line 1910:no error
assemblerealspacebands() line 1928:no error
assemblerealspacebands() line 1933:no error
order 2 sideband assembly completed
assemblerealspacebands() line 1943:no error
HERE
Output: /tmp/sld_temp_adieowvow_81912903/GPUsirecon/image_0_Obj_Scan_Single_Channel_Ch1_-_3_P0_T0000_C00.ome_proc.tif
amin, amax took: 1.000000 s
Time point 0, wave 0 done

@linshaova
Copy link
Collaborator

Hi @dan-alford (sorry for the late response), these two instances seem to be both example of failure (neither angle is anywhere close to ±1.57, and both amp values are super low (like < 0.05)), and they were applied to different input data it seems like. I'd say this comparison doesn't tell us much. I was hoping to see a comparison where a dataset was successfully processed by your older version and not successful with the latest version. Is that possible to provide?

@dan-alford
Copy link

@linshaova apologies for the delays. I can provide them, but I can't see to attach them as tiffs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants