Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when executing "run_ml_surrogate_15_stage.py" #773

Closed
lwan86 opened this issue Dec 17, 2024 · 26 comments · Fixed by #621
Closed

Error when executing "run_ml_surrogate_15_stage.py" #773

lwan86 opened this issue Dec 17, 2024 · 26 comments · Fixed by #621
Assignees
Labels
backend: openmp Specific to OpenMP execution (CPUs) bug: affects latest release Bug also exists in latest release version bug Something isn't working component: tests examples, tests and benchmarks

Comments

@lwan86
Copy link

lwan86 commented Dec 17, 2024

I got the following error:

Traceback (most recent call last):
File "....../run_ml_surrogate_15_stage.py", line 358, in
sim.track_particles()
^^^^^^^^^^^^^^^^^^^
AttributeError: 'impactx.impactx_pybind.ImpactX' object has no attribute 'track_particles'. Did you mean: 'add_particles'?

In order to install pytorch, I had to downgrade the python version from 3.13 to 3.12 in the conda environment which downgraded a bunch of other packages. Maybe this is the cause of this issue. The following are all packages in the conda environment after downgradation:

_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
adios2 2.10.1 nompi_py312he8c16a3_101 conda-forge
aiohappyeyeballs 2.4.4 pyhd8ed1ab_1 conda-forge
aiohttp 3.10.5 py312h5eee18b_0
aiosignal 1.3.2 pyhd8ed1ab_0 conda-forge
amrex 24.10 nompi_hd58f592_100 conda-forge
attrs 24.3.0 pyh71513ae_0 conda-forge
blas 1.0 mkl
bottleneck 1.4.2 py312ha883a20_0
brotli-python 1.0.9 py312h6a678d5_8
bzip2 1.0.8 h4bc722e_7 conda-forge
c-ares 1.34.4 hb9d3cd8_0 conda-forge
c-blosc2 2.14.4 hb4ffafa_1 conda-forge
ca-certificates 2024.12.14 hbcca054_0 conda-forge
certifi 2024.8.30 py312h06a4308_0
charset-normalizer 3.3.2 pyhd3eb1b0_0
cpuonly 2.0 0 pytorch
ffmpeg 4.3 hf484d3e_0 pytorch
fftw 3.3.10 nompi_hf1063bd_110 conda-forge
filelock 3.13.1 py312h06a4308_0
freetype 2.12.1 h4a9f257_0
frozenlist 1.5.0 py312h5eee18b_0
giflib 5.2.2 h5eee18b_0
gmp 6.2.1 h295c915_3
gnutls 3.6.15 he1e5248_0
hdf5 1.14.3 nompi_hdf9ad27_105 conda-forge
idna 3.10 pyhd8ed1ab_1 conda-forge
impactx 24.10 py312ha6bd4de_0 conda-forge
intel-openmp 2023.1.0 hdb19cb5_46306
jinja2 3.1.4 py312h06a4308_1
jpeg 9e h5eee18b_3
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.20.1 h81ceb04_0 conda-forge
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.43 h712a8e2_2 conda-forge
lerc 3.0 h295c915_0
libadios2 2.10.1 nompi_hfc72546_101 conda-forge
libaec 1.1.3 h59595ed_0 conda-forge
libblas 3.9.0 1_h86c2bf4_netlib conda-forge
libcblas 3.9.0 8_h3b12eaf_netlib conda-forge
libcurl 8.9.1 h251f7ec_0
libdeflate 1.17 h5eee18b_1
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libexpat 2.6.4 h5888daf_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc 14.2.0 h77fa898_1 conda-forge
libgcc-ng 14.2.0 h69a702a_1 conda-forge
libgfortran 14.2.0 h69a702a_1 conda-forge
libgfortran-ng 14.2.0 h69a702a_1 conda-forge
libgfortran5 14.2.0 hd5240d6_1 conda-forge
libgomp 14.2.0 h77fa898_1 conda-forge
libiconv 1.16 h5eee18b_3
libidn2 2.3.4 h5eee18b_0
libjpeg-turbo 2.0.0 h9bf148f_0 pytorch
liblapack 3.9.0 8_h3b12eaf_netlib conda-forge
liblzma 5.6.3 hb9d3cd8_1 conda-forge
libmpdec 4.0.0 h4bc722e_0 conda-forge
libnghttp2 1.57.0 h2d74bed_0
libnsl 2.0.1 hd590300_0 conda-forge
libopenblas 0.3.28 pthreads_h94d23a6_1 conda-forge
libpng 1.6.43 h2797004_0 conda-forge
libsodium 1.0.18 h7b6447c_0
libsqlite 3.46.0 hde9e2c9_0 conda-forge
libssh2 1.11.1 h251f7ec_0
libstdcxx 14.2.0 hc0a3c3a_1 conda-forge
libstdcxx-ng 14.2.0 h4852527_1 conda-forge
libtasn1 4.19.0 h5eee18b_0
libtiff 4.5.1 h6a678d5_0
libunistring 0.9.10 h27cfd23_0
libuuid 2.38.1 h0b41bf4_0 conda-forge
libwebp 1.3.2 h11a3e52_0
libwebp-base 1.3.2 h5eee18b_1
libxcrypt 4.4.36 hd590300_1 conda-forge
libzlib 1.2.13 h4ab18f5_6 conda-forge
llvm-openmp 14.0.6 h9e868ea_0
lz4-c 1.9.4 h6a678d5_1
markupsafe 2.1.3 py312h5eee18b_0
mkl 2023.1.0 h213fc3f_46344
mkl-service 2.4.0 py312h5eee18b_1
mkl_fft 1.3.11 py312h5eee18b_0
mkl_random 1.2.8 py312h526ad5a_0
more-itertools 10.5.0 pyhd8ed1ab_1 conda-forge
mpmath 1.3.0 py312h06a4308_0
msgpack-python 1.0.3 py312hdb19cb5_0
multidict 6.1.0 py312h5eee18b_0
ncurses 6.5 he02047a_1 conda-forge
nettle 3.7.3 hbbd107a_1
networkx 3.3 py312h06a4308_0
numexpr 2.10.1 py312h3c60e43_0
numpy 1.26.4 py312hc5e2394_0
numpy-base 1.26.4 py312h0da6c21_0
openh264 2.1.1 h4ff587b_0
openjpeg 2.5.2 he7f1fd0_0
openpmd-api 0.16.0 nompi_py312h7395de0_100 conda-forge
openssl 3.4.0 hb9d3cd8_0 conda-forge
packaging 24.2 pyhd8ed1ab_2 conda-forge
pandas 2.2.3 py312h6a678d5_0
pillow 11.0.0 py312hfdbf927_0
pip 24.2 py312h06a4308_0
plotly 5.24.1 pyhd8ed1ab_1 conda-forge
propcache 0.2.0 py312h5eee18b_0
pyamrex 24.10 nompi_py312h8ddaebe_100 conda-forge
pybind11-abi 4 hd8ed1ab_3 conda-forge
pysocks 1.7.1 py312h06a4308_0
python 3.12.3 hab00c5b_0_cpython conda-forge
python-dateutil 2.9.0.post0 pyhff2d567_1 conda-forge
python-tzdata 2024.2 pyhd8ed1ab_1 conda-forge
python_abi 3.12 5_cp312 conda-forge
pytorch 2.5.1 py3.12_cpu_0 pytorch
pytorch-mutex 1.0 cpu pytorch
pytz 2024.1 pyhd8ed1ab_0 conda-forge
pyyaml 6.0.2 py312h5eee18b_0
quantiphy 2.20 pyhd8ed1ab_1 conda-forge
readline 8.2 h8228510_1 conda-forge
requests 2.32.3 py312h06a4308_1
scipy 1.14.1 py312hc5e2394_0
setuptools 72.1.0 py312h06a4308_0
six 1.17.0 pyhd8ed1ab_0 conda-forge
sympy 1.13.2 py312h06a4308_0
tbb 2021.8.0 hdb19cb5_0
tenacity 9.0.0 pyhd8ed1ab_1 conda-forge
termcolor 2.5.0 pyhd8ed1ab_1 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
torchaudio 2.5.1 py312_cpu pytorch
torchvision 0.20.1 py312_cpu pytorch
trame 3.7.1 pyhd8ed1ab_1 conda-forge
trame-client 3.5.0 pyhd8ed1ab_1 conda-forge
trame-matplotlib 2.0.2 pyhd8ed1ab_0 conda-forge
trame-plotly 3.0.2 pyhd8ed1ab_0 conda-forge
trame-router 2.3.0 pyhd8ed1ab_0 conda-forge
trame-server 3.2.3 pyhd8ed1ab_2 conda-forge
trame-vuetify 2.7.2 pyhd8ed1ab_1 conda-forge
trame-xterm 0.2.1 pyhd8ed1ab_0 conda-forge
typing_extensions 4.11.0 py312h06a4308_0
tzdata 2024b hc8b5060_0 conda-forge
urllib3 2.2.3 py312h06a4308_0
wheel 0.44.0 py312h06a4308_0
wslink 2.2.1 pyhd8ed1ab_1 conda-forge
xz 5.4.6 h5eee18b_1
yaml 0.2.5 h7b6447c_0
yarl 1.18.0 py312h5eee18b_0
zeromq 4.3.5 h6a678d5_0
zfp 0.5.5 h9c3ff4c_8 conda-forge
zlib 1.2.13 h4ab18f5_6 conda-forge
zlib-ng 2.0.7 h5eee18b_0
zstd 1.5.6 ha6fb4c9_0 conda-forge

@ax3l ax3l added question Further information is requested component: tests examples, tests and benchmarks labels Jan 7, 2025
@ax3l
Copy link
Member

ax3l commented Jan 7, 2025

Hi @lwan86,

Thank you for the issue! I think you took the example from the development branch, but ran the last stable release from conda (24.11).

In the prior releases (24.11), the function in question above is still called .evolve() #741

We'll have a new release on conda with 25.01 that will use the new naming.

Best,
Axel

@ax3l ax3l self-assigned this Jan 7, 2025
@lwan86
Copy link
Author

lwan86 commented Jan 7, 2025

Thanks, @ax3l ! 👍

@ax3l
Copy link
Member

ax3l commented Jan 7, 2025

You are welcome!

I am still working on adding this example to CI #621 coverage, please let me know if you encounter any other issues :)

@lwan86
Copy link
Author

lwan86 commented Jan 11, 2025

@ax3l I switched to 24.10 (I couldn't find release 24.11 on GitHub) and executed the python script again. I got the following error:

device set to default, cpu
Downloading trained models from Zenodo.org - this might take a minute...
.../ImpactX/release/impactx-24.10/examples/pytorch_surrogate_model/surrogate_model_definitions.py:109: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
model_dict = torch.load(model_file, map_location="cpu")
Initializing AMReX (24.10)...
OMP initialized with 4 OMP threads
AMReX (24.10) initialized

Grids Summary:
Level 0 1 grids 512 cells 100 % of domain

Diagnostics: 1
Space Charge effects: 0
CSR effects: 0
++++ Starting step=1 slice_step=0
[ADIOS2 backend] WARNING: Parameter adios2.engine.usesteps is deprecated since use of steps is now always enabled.

**** WARNINGS ******************************************************************

  • GLOBAL warning list after [ FIRST STEP ]
  • No recorded warnings.

++++ Starting step=2 slice_step=0
0::Assertion depth == 1 || MFIter::allow_multiple_mfiters' failed, file "/home/conda/feedstock_root/build_artifacts/amrex_1727812980255/work/Src/Base/AMReX_MFIter.cpp", line 281, Msg: "Nested or multiple active MFIters is not supported by default. This can be changed by calling MFIter::allowMultipleMFIters(true)". !!! SIGABRT 0::Assertion depth == 1 || MFIter::allow_multiple_mfiters' failed, file "/home/conda/feedstock_root/build_artifacts/amrex_1727812980255/work/Src/Base/AMReX_MFIter.cpp", line 281, Msg: "Nested or multiple active MFIters is not supported by default. This can be changed by calling MFIter::allowMultipleMFIters(true)". !!!
0::Assertion `depth == 1 || MFIter::allow_multiple_mfiters' failed, file "/home/conda/feedstock_root/build_artifacts/amrex_1727812980255/work/Src/Base/AMReX_MFIter.cpp", line 281, Msg: "Nested or multiple active MFIters is not supported by default. This can be changed by calling MFIter::allowMultipleMFIters(true)". !!!
See Backtrace.0.0 file for details

@ax3l ax3l added bug Something isn't working bug: affects latest release Bug also exists in latest release version and removed question Further information is requested labels Jan 13, 2025
@ax3l
Copy link
Member

ax3l commented Jan 13, 2025

I switched to 24.10 (I couldn't find release 24.11 on GitHub) and executed the python script again.

Ok perfect. The 25.01 release is now out and also has the new function.

. I got the following error: ... depth == 1 || MFIter::allow_multiple_mfiters

Oh, that is a bug and the example might be outdated. I think we fixed that in the revision of the @RTSandberg et al. PASC paper, let me dig through it if we forgot to update the example here. We'll add a bug fix to #621

@ax3l ax3l mentioned this issue Jan 13, 2025
5 tasks
@ax3l
Copy link
Member

ax3l commented Jan 13, 2025

Ah @RTSandberg it looks like you forgot to check in the staged_lpa_impactx_surrogate directory to the Zenodo archive, which you need to run your notebook.

On GDrive I found those files that could be the latest version:
07_optimize_improve_stage0_from_warpx-20250113T180047Z-001.zip

@ax3l
Copy link
Member

ax3l commented Jan 13, 2025

The issue is interestingly when we append the Drift between stages... Have to remind myself how we solved this last time.

@ax3l
Copy link
Member

ax3l commented Jan 13, 2025

Ok, the problem is avoided by setting OMP_NUM_THREADS=1 (no threading). Have to dig why this is an issue.

@ax3l
Copy link
Member

ax3l commented Jan 13, 2025

Found it, some bad interplay with our threading and torch threading. So torch.set_num_threads(1) does the trick as well.

There is some issue mixing MPI+OMP in our app with PyTorch threading: AMReX-Codes/pyamrex#322
The issue can be avoided by:

  • instead using GPUs
  • on CPU, either
    • disabling MPI
    • disabling OpenMP or setting OMP_NUM_THREADS=1

One could double check that this is not just a confusion between using different OMP libraries for PyTorch and ImpactX/AMReX, which happens quickly when one pulls in PyTorch from, e.g., pip instead of using a consistent package manager (Spack, Conda-Forge) for the whole software stack.

@ax3l
Copy link
Member

ax3l commented Jan 13, 2025

@lwan86 #621 will fix it, but please feel free to chat more if you see other issues with it or some of the current limitations do not work for you.

@ax3l ax3l added the backend: openmp Specific to OpenMP execution (CPUs) label Jan 13, 2025
@lwan86
Copy link
Author

lwan86 commented Jan 14, 2025

@ax3l It's a little confusing. Could you let me know which version of the code can run properly?

@ax3l ax3l closed this as completed in #621 Jan 14, 2025
@ax3l
Copy link
Member

ax3l commented Jan 14, 2025

The latest development branch (as of now) or the upcoming 25.02 release (beginning of February) each will have the bug fix.

@ax3l ax3l reopened this Jan 14, 2025
@lwan86
Copy link
Author

lwan86 commented Jan 15, 2025

I tried the latest development branch, but got the following error:

python3 run_ml_surrogate_15_stage.py
device set to default, cpu
Downloading trained models from Zenodo.org - this might take a minute...
Initializing AMReX (24.10)...
OMP initialized with 4 OMP threads
AMReX (24.10) initialized

Grids Summary:
Level 0 1 grids 512 cells 100 % of domain

Traceback (most recent call last):
File ".../github/ImpactX/development/impactx/examples/pytorch_surrogate_model/run_ml_surrogate_15_stage.py", line 360, in
sim.track_particles()
^^^^^^^^^^^^^^^^^^^
AttributeError: 'impactx.impactx_pybind.ImpactX' object has no attribute 'track_particles'. Did you mean: 'add_particles'?

@ax3l
Copy link
Member

ax3l commented Jan 15, 2025

Please update your ImpactX version to 25.01 or change

sim.track_particles()

to

sim.evolve()

for the 24.10 version that you are using.

@lwan86
Copy link
Author

lwan86 commented Jan 15, 2025

Never mind. I forgot to update the conda environment.

@ax3l
Copy link
Member

ax3l commented Jan 15, 2025

Excellent - does it work now? :)

@lwan86
Copy link
Author

lwan86 commented Jan 15, 2025

I think now it is working. Thanks! @ax3l

@ax3l
Copy link
Member

ax3l commented Jan 15, 2025

Great! Let me know when you have more details/progress/results that you like to discuss (here or per email).

@ax3l
Copy link
Member

ax3l commented Jan 15, 2025

Shall we close this issue for now?

@lwan86
Copy link
Author

lwan86 commented Jan 15, 2025

The next thing I plan to try is to let the code output the intermediate results of the surrogate model run through openpmd/adios2. Is it possible to enable adios2's staging engine here?

@lwan86
Copy link
Author

lwan86 commented Jan 15, 2025

I think we can close this one for now. We can have further discussions through emails.

@ax3l
Copy link
Member

ax3l commented Jan 15, 2025

I have not tried SST with ImpactX yet, but this should work or not be hard to enable. As the file ending/backend, using sst for the BeamMontor element should work. If there is a programmatic block, the logic resides here:
https://github.com/ECP-WarpX/impactx/blob/25.01/src/particles/elements/diagnostics/openPMD.cpp#L173-L210

@ax3l
Copy link
Member

ax3l commented Jan 15, 2025

Let's start a separate issue regarding SST work?

@ax3l ax3l closed this as completed Jan 15, 2025
@lwan86
Copy link
Author

lwan86 commented Jan 15, 2025

One more minor issue: the python script tries to download the model file from Zenodo even it has been downloaded.

@lwan86
Copy link
Author

lwan86 commented Jan 15, 2025

Yes. Let's open a separate one.

@ax3l
Copy link
Member

ax3l commented Jan 15, 2025

One more minor issue: the python script tries to download the model file from Zenodo even it has been downloaded.

Yes, please feel free to patch this out. This is just for the unit test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: openmp Specific to OpenMP execution (CPUs) bug: affects latest release Bug also exists in latest release version bug Something isn't working component: tests examples, tests and benchmarks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants