Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation of workflow fails when code is built using CMake approach #1168

Open
natalie-perlin opened this issue Dec 17, 2024 · 2 comments · May be fixed by #1171
Open

Generation of workflow fails when code is built using CMake approach #1168

natalie-perlin opened this issue Dec 17, 2024 · 2 comments · May be fixed by #1171
Labels
bug Something isn't working

Comments

@natalie-perlin
Copy link
Collaborator

Expected behavior

  • SRW code is built using CMake approach (not with devbuild.sh);
  • Conda environment is built using devbuild.sh --platform=<platform> conda_only;
  • ./ufs-srweather-app/modulefiles/wflow_ is loaded, srw_app conda environment activated;
  • create and edit ./ufs-srweather-app/ush/config.yaml
  • Generate a workflow:
    cd ./ufs-srweather-app/ush , ./generate_FV3LAM_wflow.py
    Workflow is successfully generated

Current behavior

After the SRW code builds successfully, conda environment is built and activated successfully, workflow generation produces an error.

One issue could be manually solved:
cp ./ufs-srweather-app/build/build_settings.yaml ./ufs-srweather-app/exec/.

After that fix, workflow still produces errors.

Machines affected

Any machine is likely affected if SRW code is built using CMake approach

Tested on MacOS, noaacloud. Building on MacOS does not work using devbuild.sh script, and has to be done using CMake approach only.

Steps To Reproduce

(on NOAA-AWS cloud)

  1. git clone https://github.com/ufs-community/ufs-srweather-app.git ufs-srw-intel
  2. cd ufs-srw-intel/
  3. ./manage_externals/checkout_externals
  4. module use $PWD/modulefiles
  5. module load build_noaacloud_intel
  6. ./devbuild.sh -v --platform=noacloud conda_only
  7. mkdir build && cd build
  8. cmake .. -DCMAKE_INSTALL_PREFIX=.. -DCMAKE_INSTALL_BINDIR=exec ..
  9. make -j 4 VERBOSE=1 2>&1 | tee log.build.001
  10. cd ush
  11. cp -pv config.community.yaml config.yaml; vim config.yaml
  12. ./generate_FV3LAM_wflow.py

See the bug... -->


 ========================================================================
 Starting experiment generation...
 ========================================================================

 ========================================================================
 Starting function setup() in "setup.py"...
 ========================================================================
Found and allowing key metatask_run_ensemble

*********************************************************************
FATAL ERROR:
Experiment generation failed. See the error message(s) printed below.
For more detailed information, check the log file from the workflow
generation script: /contrib/Natalie.Perlin/SRW/ufs-srw-intel/ush/log.generate_FV3LAM_wflow
*********************************************************************

Traceback (most recent call last):
 File "/contrib/Natalie.Perlin/SRW/ufs-srw-intel/ush/./generate_FV3LAM_wflow.py", line 867, in <module>
   expt_dir = generate_FV3LAM_wflow(USHdir, pargs.config, wflow_logfile, pargs.debug)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/contrib/Natalie.Perlin/SRW/ufs-srw-intel/ush/./generate_FV3LAM_wflow.py", line 75, in generate_FV3LAM_wflow
   expt_config = setup(ushdir,user_config_fn=config,debug=debug)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/contrib/Natalie.Perlin/SRW/ufs-srw-intel/ush/setup.py", line 467, in setup
   build_config = load_config_file(build_config_fp)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/contrib/Natalie.Perlin/SRW/ufs-srw-intel/ush/python_utils/config_parser.py", line 662, in load_config_file
   return load_yaml_config(file_name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/contrib/Natalie.Perlin/SRW/ufs-srw-intel/ush/python_utils/config_parser.py", line 56, in load_yaml_config
   with open(config_file, "r") as f:
        ^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/contrib/Natalie.Perlin/SRW/ufs-srw-intel/exec/build_settings.yaml'    

Detailed Description of Fix (optional)

One apparent issue could be manually solved:
cp ./ufs-srweather-app/build/build_settings.yaml ./ufs-srweather-app/exec/.

Errors are still produces after that fix:

  ========================================================================
  Starting experiment generation...
  ========================================================================

  ========================================================================
  Starting function setup() in "setup.py"...
  ========================================================================
Found and allowing key metatask_run_ensemble

*********************************************************************
FATAL ERROR:
Experiment generation failed. See the error message(s) printed below.
For more detailed information, check the log file from the workflow
generation script: /contrib/Natalie.Perlin/SRW/ufs-srw-intel/ush/log.generate_FV3LAM_wflow
*********************************************************************

Traceback (most recent call last):
  File "/contrib/Natalie.Perlin/SRW/ufs-srw-intel/ush/./generate_FV3LAM_wflow.py", line 867, in <module>
    expt_dir = generate_FV3LAM_wflow(USHdir, pargs.config, wflow_logfile, pargs.debug)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/contrib/Natalie.Perlin/SRW/ufs-srw-intel/ush/./generate_FV3LAM_wflow.py", line 75, in generate_FV3LAM_wflow
    expt_config = setup(ushdir,user_config_fn=config,debug=debug)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/contrib/Natalie.Perlin/SRW/ufs-srw-intel/ush/setup.py", line 471, in setup
    if build_config["Machine"].upper() != expt_config["user"]["MACHINE"]:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'upper'
@natalie-perlin natalie-perlin added the bug Something isn't working label Dec 17, 2024
@MichaelLueken
Copy link
Collaborator

@natalie-perlin While you noted during the SRW code management meeting that you were able to make changes to allow the devbuild.sh script to work with MacOS, I figured I'd still outline the two methods to get the CMake approach to work with the current develop branch:

  1. The easiest method is to use the CMake approach to build the SRW App. While creating the conda environment, once the environment has been created, if you choose option C (continue), then the build_settings.yaml file will include the proper machine name and will be moved automatically to the exec directory.
  2. Alternatively, you can add -DBUILD_MACHINE=<machine> (replacing <machine> with the machine that you are working on) to the default CMake approach given in the documentation. You will need to manually move the build_settings.yaml file before you can run an experiment, but the file should have the required entries to allow your experiment to run.

@natalie-perlin
Copy link
Collaborator Author

natalie-perlin commented Dec 20, 2024

@MichaelLueken -
thank you for your comments.

Yes, adding -DBUILD_MACHINE=<machine> to the cmake method indeed resolves the issue. The documentation may then be updated to mention this option. After building with CMake and then moving the build_settings.yaml to ./exec directory, a workflow was successfully generated using usual steps.

There seems to be a few changes to the software stack and conda configurations that affected the behavior of the devbuild.sh on a MacOS (and possibly on other generic machines, but not tested yet), so the script exits with the error even when when run for conda_onlytarget.
These changes include:

  1. MPI_ROOT variable is not defined if MPI (openmpi) is built by the spack-stack.
  2. There is an issue with HISTFILE variable that needs to be defined in the bash shell environment, as a requirement by conda. It is not always the case.
    After fixing these two variables in the devbuild.sh script, it could be used to build the SRW for the MacOS platform.

I'm doing a bit more testing to ensure only minimal changes would be introduced to make things work, and will make a PR into the SRW repository with the suggested changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants