Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to the authoritative develop branch since the SRW v2.2 release (10/31/2023) #981

Open
MichaelLueken opened this issue Nov 30, 2023 · 69 comments

Comments

@MichaelLueken
Copy link
Collaborator

PR #938 - Build conda and environments in SRW. Merged on 11/29/2023:
Modifies devbuild.sh to add the option to install miniforge (a version of miniconda that manages channels more strictly) in a user's specified location and defaults to inside the user's clone. It also installs two environments needed for SRW -- srw_app, which is similar to the old workflow_tools environment, and srw_graphics, which is sufficient to support the plotting scripts in SRW. If the SRW builds the AQM, then a third environment, srw_aqm, is also installed to support AQM.

@MichaelLueken
Copy link
Collaborator Author

PR #924 - Add job cards for wrappers for individual machines. Merged on 12/05/2023:
Added job cards for individual tasks, similar to wrapper scripts, but tailored for each job scheduler used on NOAA HPCs (PBS Pro and Slurm).

@MichaelLueken
Copy link
Collaborator Author

PR #977 - Fixing bug: moved placing fix_lam tests' directories from common place (ufs-srweather-app) to each tests' run directory. Merged on 12/14/2023:

  • Fixing bug that placed fix_lam tests' directories in common place
    causing it to overwrite content randomly.
  • Switch custom grid WE2E tests for coverage.hera.gnu.com and coverage.hera.intel.nco to allow the latter to successfully run without failure.

@MichaelLueken
Copy link
Collaborator Author

PR #994 - Integrate UW CLI tool for templater and remove external dependency. Merged on 01/11/2024:
The workflow-tools package was initially integrated with SRW as an external repository under ush/python_utils. Since then, we have packaged the code as a conda package and it is now installed automatically on most platforms (WCOSS excluded, but with workarounds in place).

The prior integration is removed in this update, while leaning on the UW command line tools available from the conda package. For now, this involves calling the command line tools in a subprocess from Python code. The UW team have an API under development that will replace this in the near future, so this will not likely be the final result for the Python-based scripts you see here.

@MichaelLueken
Copy link
Collaborator Author

PR #997 - Fixing several issues, including 966 (bash octal issue); add new winter weather verification test with staged data. Merged on 01/11/2024:
New test

  • The new test MET_ensemble_verification_winter_wx is added. This test will exercise a number of yet-untested capabilities in the workflow, including a 10-member ensemble, snowfall verification with staged data (so can be run on all platforms, not just Jet and Hera), and several SPP settings.
  • As part of this new test, snowfall observations will now be staged on all tier-1 platforms, as well as netCDF GFS data and other observation types, all for the date 2022020300

Resolved issues

  • Incorrect octal notation causing ensemble vx to fail Incorrect octal notation causing ensemble vx to fail #966 Resolved: In several locations, an explicit conversion is done to ENSMEM_INDX to ensure it is a base-10 integer, to avoid problems with bash interpreting numbers with a leading zero as octal.
  • Should "EXPT_SUBDIR" be a mandatory variable? Should "EXPT_SUBDIR" be a mandatory variable? #978 resolved: per discussion in a recent SRW code management meeting, give EXPT_SUBDIR a default value "experiment" to avoid unnecessary complications and work for users. Additionally, the default behavior if an experiment directory already exists is changed to "quit" rather than "delete"
  • Issue mentioned in this discussion; the setting fhzero=6 is removed from the weather model namelist for CCPP suite FV3_GFS_v17_p8, which allows precipitation and other accumulations to be made every hour rather than 6 hours (SRW output is always hourly, so this makes sense). Also, update diag_table.FV3_GFS_v17_p8 so that all output files will be hourly
  • Per discussion in [release/public-v2.2.0] Fix crontab bug for Cheyenne and Derecho, update PR template for new platforms [release/public-v2.2.0] Fix crontab bug for Cheyenne and Derecho, update PR template for new platforms #939, remove an unnecessary special case in get_crontab_contents.py for Derecho

Other fixes

  • Some old files for unsupported/removed CCPP suites are removed
  • Add some missing task dependencies for retrieving verification obs

General improvements

  • Many improvements to verification obs-pulling task
    • NDAS observations are now retrieved for forecast hour zero, and a better obs file is retrieved for major obs times (00z, 06z, 12z, 18z) per EMC guidance
    • Better in-line comments/documentation
    • Standardize order and messaging for file-on-disk checks across all observation types
  • Added explanatory comments for reflectivity field in diag_table files
  • Update diag_table.FV3_GFS_v17_p8 so that all output files will be hourly
  • Simplify task dependencies that rely on staged verification observations; these "get_obs" tasks should always be run (they check that the data exists before trying to retrieve it), so no need to make the dependency conditional
  • Add a check in monitor_jobs.py to ensure the yaml file does not contain duplicate experiment directories
  • Make sure the key in the experiment dictionary used by is unique by appending the current date/time to the exptdir name; additionally, set this key as the WORKFLOW_ID variable (so that it could be used in the workflow if necessary).

@MichaelLueken
Copy link
Collaborator Author

PR #963 - Merge relevant release documentation updates into develop. Merged on 01/12/2024:
A variety of updates to the release v2.2.0 documentation are relevant to the develop branch and are being incorporated via this PR.

@MichaelLueken
Copy link
Collaborator Author

PR #973 - Verification upgrades and bug fixes. Merged on 01/16/2024:
This update cleans up and simplifies the verification tasks in the SRW App. Main changes:

  • For each METplus tool that is run on APCP, combine APCP01h and APCPgt01h METplus configuration (conf) files into one so that different behavior/code/scripts are not required for 01h vs. >01h.
  • For APCP01h verification, use the NetCDF files generated by the PcpCombine_[fcst|obs] tasks instead of the original GRIB2 files. Note that for 01h accumulation, all that the the PcpCombine_[fcst|obs] tasks do is convert from GRIB2 to NetCDF format (unlike for >01h, for which the 01hr accumulations must be summed to obtain 03h, 06h, etc accumulations). This must be done because currently, even though the NetCDF files for APCP01h are created by the PcpCombine_[fcst|obs] tasks, they are not actually used by downstream tasks such as GridStat.
  • Change thresholds in METplus conf files to use "ge", "gt", etc as opposed to "<=", "<", etc. This is so that thresholds can also easily be used in file and variable names.
  • For clarity, change SFC and UPA verification field group names into ADPSFC and ADPUPA, respectively.
  • Change behavior of ASNOW verification to be more similar to that of APCP.
  • Bug fix in GridStat_ensprob_ASNOW.conf. There is an inadvertent shift in the threshold values used in the forecast field array names with respect to the threshold values specified for the observations. Fix to make thresholds for forecast and obs match.
  • Add METplus logging level control in the main SRW App config file.
  • For clarity, rename some verification variables as needed (in the main SRW App config file, in the rocoto workflow xml, in ex-scripts, etc).
  • Clean up comments in METplus config files and make these files more similar to each other where possible.

@MichaelLueken
Copy link
Collaborator Author

PR #1012 - Add -n 1 to allow the use of the service partition. Merged on 02/09/2024
Following the Slurm update on Hera and Jet, the service partition is no longer usable within the SRW App. The necessary changes to allow the service partition to once again function properly have been made, by adding -n 1 to the SCHED_NATIVE_CMD_HPSS variable in the Hera and Jet machine yaml files, and updating the native entry in the parm/wflow/verify_pre.yaml and parm/wflow/aqm_prep.yaml files.

@MichaelLueken
Copy link
Collaborator Author

PR #1014 - Quarterly Documentation Update (PI11). Merged on 02/15/2024:
Updates include:

  • Edits to ConfigWorkflow.rst to align it with the current config_defaults.yaml
  • Addition of new FAQs
  • Improve flow of the container chapter by moving extra info to an appendix rather than including it in the main container instructions.
  • Add a make linkcheck function to documentation Makefile to check for any problems with links and fix issues w/links

@MichaelLueken
Copy link
Collaborator Author

PR #969 - Update SRW with spack-stack version 1.5.0 (from 1.4.1). Merged on 02/15/2024:

  • Update SRW with spack-stack 1.5.0
  • Remove Gaea-C4 (gaea) from SRW App
  • Update run_vx.local.lua on Orion to allow verifications tests to run successfully
  • Temporarily comment out Gaea-C5 (gaeac5) from SRW App

@MichaelLueken
Copy link
Collaborator Author

PR #917 - Enable UPP 2d decomposition. Merged on 02/21/2024:
Changes to enable 2d decomposition include:

  • parm/model_configure - Added itasks to the model_configure file (values greater than 1 enable 2d decomposition in inline post).
  • scripts/exregional_run_post.sh - Added numx to the end of the &NAMPGB namelist options (values of numx greater than 1 enable 2d decomposition in offline post).
  • ush/create_model_configure_file.py - Added itasks to the list of variables to be added to the model_configure file.
  • tests/WE2E/test_configs/grids_extrn_mdls_suites_community/config.grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2.yaml - Added ITASKS: 2 to enable inline post 2d decomposition.
  • tests/WE2E/test_configs/grids_extrn_mdls_suites_community/config.grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta.yaml - Added NUMX: 2 to enable offline post 2d decomposition.

The ufs-weather-model (020e783), UPP (fae617b), and UFS_UTILS (dc0e4a6) hashes have been updated in this work.

@MichaelLueken
Copy link
Collaborator Author

PR #1018 - Update doc requirements and add logo. Merged on 02/21/2024:

  • Specify a minimum version of Sphinx in requirements.in
  • Update requirements.txt based on requirements.in for a full list of dependencies. This will ensure uniform documentation builds across platforms.
  • Add UFS logo to documentation.
  • Rename docs directory to doc for NCO compliance.
  • Slightly darken blue sidebar background to meet WCAG AA text/background contrast standards for large text.

@MichaelLueken
Copy link
Collaborator Author

PR #1041 - Changes for Rocky8 on Hera. Merged on 02/26/2024:
Hera is switching from CentOS to Rocky OS.

  • One of changes needed for SRW to run on both Rocky and CentOS is to update cmake version from 3.20.1 to 3.23.1
  • Some HPSS batch jobs need more than 2GB
  • Fix in bash scripts/exregional_make_lbcs.sh

@MichaelLueken
Copy link
Collaborator Author

PR #1043 - Add three UFS Case Studies to WE2E testing process. Merged on 02/28/2024:
Adding three additional UFS Case Studies (2019 Hurricane Barry, 2019 Halloween Storm, and 2020 July CAPE) to the workflow end-to-end testing process.

  • The UFS Case studies test yamls were created and added to the gaea-c5, derecho, and hera.gnu.com coverage suite files as well as the comprehensive suite files.
  • Removed Cheyenne related logic from current WE2E UFS Case Study (2020_CAD) and extended fcst wall time for it.

@MichaelLueken
Copy link
Collaborator Author

PR #1047 - Update for Gaea-c5. Merged on 02/29/2024:

  • Enable SRW to run on Gaea-c5, use spack-stack v1.5.0, and SRW-built conda environment
  • Update code to rename "gaea-c5" platform to "gaea". The name in for Jenkins still need to remain "gaeac5" at the moment.

A solution to solve library conflict for libstdc++.so.6 was to preload a specific library during a runtime, as specified in ./modulefiles/wflow_gaea.lua , ./modulefiles/tasks/gaea/python_srw.lua:

setenv("LD_PRELOAD", "/opt/cray/pe/gcc/12.2.0/snos/lib64/libstdc++.so.6")

@MichaelLueken
Copy link
Collaborator Author

PR #1046 - Add Contributor's Guide to documentation. Merged on 03/01/2024:
This PR adds a Contributor's Guide to the docs alongside the User's Guide.

  • The Contributor's Guide includes general information on use of Git submodules in the UFS. This information can be adapted in a future PR to be more SRW-specific based on user needs/requests and any training we provide.
  • This PR also configures the docs so that Technical Documentation can be easily added at a later date.

@MichaelLueken
Copy link
Collaborator Author

PR #1040 - Fix sample script and WE2E test for AQM. Merged on 03/05/2024:

  • Fixes the failure on the sample script ush/config.aqm.community.yaml.
  • Fixes the failure on the WE2E test test/WE2E/test_configs/aqm/config.aqm_grid_AQM_NA13km_suite_GFS_v16.yaml for AQM.

@MichaelLueken
Copy link
Collaborator Author

PR #1042 - Add integration test job. Merged on 03/08/2024:
This update adds a test job to the workflow. It was originally written with pytest but because of some file naming issues, the python package unittest was used instead. The test checks for the existence of netcdf files from the weather model.

The necessary scripts were added or modified to incorporate the integration job into the workflow. A wrapper script was also added.

@MichaelLueken
Copy link
Collaborator Author

PR #1045 - Jet switch from CentOS to Rocky. Merged on 03/13/2024:
Jet has migrated from CentOS to Rocky8 following the system maintenance on 03/12/2024.

This work sets the updated Rocky8 spack-stack as default in the build_jet_intel.lua modulefile and modifies the Jet machine file to use PARTITION_FCST: xjet.

@MichaelLueken
Copy link
Collaborator Author

PR #1048 - Expand forecast fields for metric test. Merged on 03/14/2024:
This PR expands the number of forecast fields for the Skill Score metric test. The forecast length in the metric WE2E test was extended to 12 hours so that the RMSE metric can be calculated for these additional forecast fields:

  • Specific humidity for the full column
  • Temperature for the full column
  • Wind for the full column
  • Dew point, pressure, temperature, and wind at the surface level for forecast hour 12.

Adding these additional forecast fields will make the skill score metric test more thorough and thus making it a more inclusive test to compare against.

Also, a change was made to the .cicd/scripts/srw_metric_example.sh script to reflect the new conda environment.

@MichaelLueken
Copy link
Collaborator Author

PR #1055 - Update GFS v17 p8 suite to address cold bias. Merged on 03/15/2024:
A SRW App user noticed #1004 with the FV3_GFS_v17_p8 physics suite, that the surface temperatures were dropping unrealistically throughout the forecast. This PR addresses that issue by updating the FV3_GFS_v17_p8 physics suite in the parm/FV3.input.yml file.

This issue was discovered in the SRW App v2.2.0, but since the FV3_GFS_v17_p8 physics suite is not officially supported for the release, the change will only go into in the develop branch.

@MichaelLueken
Copy link
Collaborator Author

PR #1054 - Use uwtools instead of set_namelist. Merged on 03/20/2024:
Continues the integration of the uwtools package. In this PR, I've done the following:

  • Call the UW config tool instead of set_namelist using the uwtools CLI in bash scripts and API in Python scripts
  • Lint the ush/set_fv3nml*.py files
  • Update uwtools to the latest release version

@MichaelLueken
Copy link
Collaborator Author

PR #1060 - Update AQM task scripts with those of production/aqm_dev branch. Merged on 03/27/2024:

  • Update the AQM task scripts with those of the production or aqm_dev branch.
  • Set the nco environment variable in the J-job scripts directly (AQM tasks only).
  • Change the vertical directory structure of the AQM task scripts to meet the NCO implementation standards.
  • Remove the nco tests from the we2e tests and nco sample scripts.
  • Change the file names of J-job and ex-scripts (AQM tasks only).

@MichaelLueken
Copy link
Collaborator Author

PR #1050 - Update weather model, UPP, and UFS_UTILS hashes. Merged on 03/27/2024:
Updating the ufs-weather-model hash to 8518c2c (March 1), the UPP hash to 945cb2c (January 23), and the UFS_UTILS hash to 57bd832 (February 6).

This work also required several modifications to allow the updated weather model and UFS_UTILS hashes to work in the SRW:

  • Update spack-stack to v1.5.1
  • Rename NEMS/nems to UFS/ufs
  • Remove ush/set_ozone_param.py (ozphys scheme in SDFs were removed in the weather model)
  • Update path to noahmptable.tbl
  • Add two new fields to INPS (MASK_ONLY and MERGE_FILE) for make_orog task
  • Make changes to allow for the updated method of finding CRES in chgres_cube

@MichaelLueken
Copy link
Collaborator Author

PR #1065 - Fix failure on warm start option of SRW-AQM. Merged on 04/04/2024:

  • Fix failure on the warm start option of SRW-AQM.
  • Change the sample script config.aqm.yaml for running a warm start.
  • Change cpreq to cp because it does not work correctly on other machines except for WCOSS2.
  • Add missing exclusion to .gitignore.

@MichaelLueken
Copy link
Collaborator Author

PR #1067 - Port SRW-AQM to Orion and Hercules. Merged on 04/08/2024:

  • Port SRW-AQM to Orion and Hercules

@MichaelLueken
Copy link
Collaborator Author

PR #1058 - Feature/cicd metrics adds methods to collect resource usage data from major stages of the SRW pipeline build job. Merged on 04/15/2024:

Updated SRW Jenkinsfile with some run-time stats collection, and adds a final stage that triggers ufs-srw-metrics stats collection job for reporting metrics.

The SRW pipeline job that uses this Jenkinsfile will now use the 'time' command when executing major stages: init, build, test. This will collect CPU, Memory, and DiskUsage measurements that can be later used in trend plots on a metrics dashboard.

Additionally, it adds options to the pipeline job that allow the operator to select just a single test, or no test suite (default is still 'coverage' suite), and allows an option to select the depth of wrapper script tasks to execute during functional testing (default is still all 9 scripts).

@MichaelLueken
Copy link
Collaborator Author

PR #1068 - Update weather model hash and correct behavior in Functional WorkflowTaskTests Jenkins stage. Merged on 04/15/2024:

  • The ufs-weather-model hash has been updated to 1411b90 (April 1, 2024).
  • Updated build_hera_gnu.lua file to allow it to work with updates to the ufs-weather-model.
  • Updated behavior of the Functional WorkflowTaskTests Jenkins stage to allow the test to properly finish, rather than waiting in queue for all jobs associated with the EPIC role account to finish first (modification to .cicd/scripts/wrapper_srw_ftest.sh).
  • Corrected the hang encountered while running the Functional WorkflowTaskTests stage on Gaea.
  • Applied Mike Kavulich's modification to ush/bash_utils/create_symlink_to_file.sh and converted calls to the create_symlink_to_file function from using named arguments to positional arguments (Issue create_symlink_to_file.sh is unreasonably slow #1066).

@MichaelLueken
Copy link
Collaborator Author

PR #1077 - Update nco version. Merged on 04/23/2024:

Hera with Intel compiler was using system installed nco library (4.9.3 version). It was not noticed until sys admins removed read permissions to 4.9.3 version and installed new version (5.1.6).

Will use spack-stack installed nco (version 5.0.6), like all other machines/compilers.

@MichaelLueken
Copy link
Collaborator Author

PR #1079 - Feature cicd scorecard metrics. Merged on 04/25/2024:

  • Update CI/CD scripts to include skill-score metric output so that follow-on metrics collection can display it on metrics Dashboard.
  • Update Jenkinsfile to fix post() section that calls follow-on metrics collection job so that it is only called once at the end, regardless if any platforms builds or tests fail independently.
  • Update the Jenkinsfile to skip platform Nodes that appear to be offline, rather than put them in the launch queue. This also means we can re-add the NOAAcloud platforms to the list of possible Nodes to attempt. The will be skipped if they are not online.
  • Update Jenkinsfile to include timeout limits on Build stage and Test stage, so they don't run forever.
  • Update Jenkinsfile to allow seeing timestamps in the Jenkins console log.

@MichaelLueken
Copy link
Collaborator Author

PR #1103 - Update requests and certifi in requirements.txt. Merged on 07/15/2024:

  • The Dependabot PR Bump certifi from 2024.2.2 to 2024.7.4 in /doc #1101 identified the need to update the certifi version, but requests should also be updated from the current (yanked) version in the requirements file.
  • The README.md and doc/README files have also been updated.

@MichaelLueken
Copy link
Collaborator Author

PR #1098 - Transition the var_defns bash file to YAML. Merged on 07/26/2024:

Use YAML for the configuration language at run time.

@MichaelLueken
Copy link
Collaborator Author

PR #1091 - Fixes for PW Jenkins Nightly Builds. Merged on 07/30/2024:

  • Adds logic to handle GCP's default conda env, which conflicts with the SRW App's conda env. Fixes a Parallel Works naming convention bug in the script.
  • It also addresses a known issue with a Ruby warning on PW instances that prevents the run_WE2E_tests.py from exiting gracefully. The solution we use in our bootstrap for /contrib doesn't seem to work for the /lustre directory, which is why the warning is hardcoded into the monitor_jobs.py script.
  • The new spack-stack build on Azure is missing a gnu library, so added the path to this missing library to the proper run scripts and cleaned up the wflow noaacloud lua file.
  • Removed log and error files from the qsub wrapper script so that qsub can generate these files with the job id in the files name. Also, fixed typo in the wrapper script.

@MichaelLueken
Copy link
Collaborator Author

PR #1104 - S3 doc updates. Merged on 08/01/2024:

As part of the data governance initiative, all s3 buckets will need some sort of versioning control. To meet these needs the AWS S3 Bucket was reorganized with the develop data stored under a 'develop-date' folder and the verification sample case and the document case (current_release_data) moved under a new folder called 'experiment-user-cases'.

@MichaelLueken
Copy link
Collaborator Author

PR #1096 - Update ufs-weather-model hash and further clean the machines tested in PULL_REQUEST_TEMPLATE. Merged on 08/12/2024:

  • Update ufs-weather-model hash to b5a1976 (July 30)
  • Add hera.gnu, remove cheyenne.intel, cheyenne.gnu, and gaeac5.intel, and alphabetize the machines in the TESTS CONDUCTED section of the PULL_REQUEST_TEMPLATE
  • Correct behavior of Jenkins Functional WorkflowTaskTests. Currently, TASK_DEPTH is set to null, resulting in no tests being run during the Functional WorkflowTaskTests stage. Replaced env with params in Jenkinsfile for setting TASK_DEPTH. Testing shows that this will correctly set TASK_DEPTH to the default value of 9 and allow the tests to run
  • Removed extraneous entries from the verification scripts to remove KeyError messages in the associated verification log files
  • Reapplied necessary modification to modulefiles/tasks/noaacloud/plot_allvars.local.lua to allow plotting tasks to run on NOAA cloud platforms

@MichaelLueken
Copy link
Collaborator Author

PR #1100 - Updates to devclean.sh script and plotting scripts and tasks. Merged on 08/23/2024:

  • ./devclean.sh script that cleans SRW builds is updated, all the cleaning tasks are done for the directories under the main SRW tree
  • Documentation updated for the devclean.sh script changes
  • Plotting scripts updated to have geographical data visible over the colored fields
  • Plotting task updated to allow graphics output for individual ensemble members
  • Use python3 to checkout external sub-modules in a checkout_externals script; python3 is a default for other scripts; some systems such as MacOS no longer come with python2

@MichaelLueken
Copy link
Collaborator Author

PR #1115 - Fix for SonarQube forked repo renaming failure. Merged on 09/04/2024:

SonarQube job fails to find user's repository if they rename when creating a fork, this change to the Jenkinsfile will pass the user's url to the SonarQube job so that it doesn't have to form the URL itself. Also passes change ID (PR number) so that information on the SonarQube job can be archived to s3 and properly aligned with the corresponding PR.

@MichaelLueken
Copy link
Collaborator Author

PR #1089 - Added an option for RRFS external model files used as ICs and LBCs. Merged on 09/12/2024:

  • An option to use RRFS model output (control) files are added as initial and lateral boundary conditions, ICS and LBCS.
    RRFS_a data for the test was retrieved from the NODD website ((https://registry.opendata.aws/noaa-rrfs/)), pressure-level grib2 files from the control directory, RRFS forecasts interpolated into 3-km regular grid.
  • A new test has been added grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta with RRFS input files for the event on 06/05/2024 with the tornadoes reported in Maryland.

@MichaelLueken
Copy link
Collaborator Author

PR #1133 - Updated ConfigWorkflow.rst to reflect changes to config_defaults.yaml (PI13). Merged on 09/16/2024:

Updated ConfigWorkflow.rst to reflect recent changes to config_defaults.yaml in order to keep documentation up to date.

@MichaelLueken
Copy link
Collaborator Author

PR #1117 - Update ufs-weather-model hash and remove machine/modulefiles/WE2E test suites for decommissioned machines. Merged on 09/20/2024:

  • Update ufs-weather-model hash to a1143cc (August 12)
  • Remove machine files, modulefiles, and WE2E test suites for decommissioned machine, Cheyenne
  • Update build_jet_intel.lua modulefile to point to the /contrib spack-stack location, rather than /lfs4. At the end of September, /lfs4 is scheduled to be unmounted
  • Update ush/machine/jet.yaml machine file to point to /lfs5 instead of /lfs4 for staged data on Jet. At the end of September, /lfs4 is schedules to be unmounted
  • Updated ContainerQuickstart.rst and RunSRW.rst to note the new container location on Jet (/lfs5 instead of /lfs4) and the location of the staged data (/lfs5 instead of /lfs4)
  • Uncommented four WE2E tests that had previously failed in make_sfc_climo in the comprehensive WE2E suite for Derecho (comprehensive.derecho)

@MichaelLueken
Copy link
Collaborator Author

PR #1129 - Modulefile updates for NOAA Cloud Rocky 8 platforms. Merged on 09/25/2024:

Updated modulefiles to use NOAA Cloud Rocky8 platforms

Files changed:

  • modulefiles/build_noaacloud_intel.lua (loading spack-stack procedure change)
  • modulefiles/wflow_noaacloud.lua

@MichaelLueken
Copy link
Collaborator Author

PR #1131 - Update python docstrings and generate preliminary technical documentation. Merged on 10/02/2024:

  • Initial implementation of SRW App technical documentation. Adds a section called "Technical Documentation" to the SRW App docs.
  • This update also removes outdated/redundant text files in doc (i.e., RUNTIME and INSTALL).

@MichaelLueken
Copy link
Collaborator Author

PR #1124 - Adding in the tutorial for the Halloween Storm. Merged on 10/10/2024:

This update adds a tutorial to the Tutorials chapter of the SRW App User's Guide. It is based off the Halloween Storm UFS case study.

@MichaelLueken
Copy link
Collaborator Author

PR #1142 - Fix get_crontab_contents.py. Merged on 10/21/2024:

Fixes bug described in issue #1141: get_crontab_contents.py fails when run as a script

@MichaelLueken
Copy link
Collaborator Author

PR #1136 - Update ufs-weather-model hash and UPP hash and use upp-addon-env spack-stack environment. Merged on 11/01/2024:

  • Update ufs-weather-model hash to 38a29a6 (September 19)
  • Update UPP hash to 81b38a8 (August 13)
  • All Tier-1 modulefiles/build_*.lua files have been updated to use the upp-addon-env spack-stack environment
  • srw_common.lua was updated to use g2/3.5.1 and g2tmpl/1.13.0 (these are required for UPP)
  • .cicd/JENKINSFILE was updated to replace cheyenne entries with derecho.
  • The doc/tables/Tests.csv table had nco-mode WE2E tests removed
  • The doc/UsersGuide/CustomizingTheWorkflow/ConfigWorkflow.rst documentation was updated to updated ush/config_defaults.yaml file.
  • The .github/CODEOWNERS file was updated to add Bruce Kropp to the list of reviewers
  • The exregional_plot_allvars.py and exregional_plot_allvars_diff.py scripts were updated to address changes made to the postxconfig-NT-fv3lam.txt file.
  • Updated ush/config_defaults.yaml to update PE_MEMBER01 calculation and documentation for OMP_NUM_THREADS_RUN_FCST to allow for the run_fcst task to properly run on Tier-1 platforms after updates to allow threading to function properly.
  • The ush/machine/*.yaml files were updated to allow for the run_fcst task to properly run on Tier-1 platforms after updates to allow threading to function properly.
  • There are not enough resources on Jet to run the high resolution WE2E tests (136 (ReqNodeNotAvail)). Commented out the tests in the comprehensive.jet test suite and removed one test from the coverage.jet test suite.
  • The ufs-case-studies WE2E tests are currently failing on Derecho. The failure is due to the file not being available. This is an issue because the file in question is named correctly and is available, but the tests fail in the get_extrn_ics/lbs tasks stating that the files aren't present. Commented out these tests in comprehensive.derecho and moved WE2E tests to remove from coverage.derecho. Issue ufs-case-studies WE2E tests fail on Derecho in get_extrn_ics/lbcs ufs-case-studies WE2E tests fail on Derecho in get_extrn_ics/lbcs #1144 was opened to track this issue on Derecho.

@MichaelLueken
Copy link
Collaborator Author

PR #1152 - Add GitHub Actions to check that Technical Docs are up-to-date. Merged on 11/12/2024:

  • Adds a GitHub Actions workflow & script to check whether the Technical Documentation is up-to-date
  • Turns on the -W flag so that documentation build warnings register as error
  • Updates the requirements file so that --remove-old flag can be used to check Tech Docs
  • Updates the Contributor's Guide w/information on Technical Documentation and troubleshooting guidelines
  • Fixes a few broken links.

@MichaelLueken
Copy link
Collaborator Author

PR #1139 - Add Community Fire Behavior Model. Merged on 11/13/2024:

This PR introduces the Community Fire Behavior Module (ufs-community/ufs-weather-model#2220) to the SRW App.

In addition, there are a number of general improvements to the UFS SRW code and workflow:

  • The addition of the build_settings.yaml file that is placed in the exec directory
  • Improve capability to use a different set of vertical levels
  • Flexible configuration file name
  • Additional options for retrieve_data.py
  • Speedup of symbolic linking
  • Various random improvements

@MichaelLueken
Copy link
Collaborator Author

PR #1137 - Make get_obs tasks day-dependent in workflow; other improvements and bug fixes. Merged on 11/18/2024:

This PR fixes multiple bugs in the verification (vx) and other parts of the SRW App, the main one being that the get_obs tasks as well as some of the vx pre-processing tasks currently do not work for an experiment with multiple cycles if those cycles overlap in time:

  • Changes related to get_obs tasks
  • Changes related to vx pre-processing tasks (PcpCombine_obs and Pb2nc_obs)
  • Small, self-contained bug fixes and improvements
  • New WE2E tests added

@MichaelLueken
Copy link
Collaborator Author

PR #1158 - Update UFS-WM to 11/14 version and UPP hash to 09/30 version. Merged on 11/25/2024:

  • Update ufs-weather-model hash to 6b0f516 (November 14, 2024)
  • Update UPP hash to 6f5dd62 (September 30, 2024)
  • Use the fms-2024.01 spack stack environment (replacing upp-addon-env) for Tier-1 platform modulefiles
  • Replace fms/2023.04 with fms/2024.01 in modulefiles/srw_common.lua
  • Removed PRMSL from parm/metplus/STATAnalysisConfig_skill_score in order to calculate the skill-score (PRMSL was replaced with MSLET in the postxconfig-NT-fv3lam.txt file)
  • The grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot WE2E test was requiring over 6 hours in the run_fcst task to complete while using GNU executables. Running with a single thread (OMP_NUM_THREADS_RUN_FCST: 1) allows the test to run using both GNU and Intel compilers without issue.

@MichaelLueken
Copy link
Collaborator Author

PR #1128 - Add SRW-SD chapter to SRW App docs. Merged on 11/25/2024:

Adds documentation for the SRW-SD features.

@MichaelLueken
Copy link
Collaborator Author

PR #1161 - Fix Jenkins Nightly Build. Merged on 12/02/2024:

The Jenkins nightly builds have been inconsistent or not working at all on the parallel works (PW) platforms. Some of the issues have been related to the instance's infrastructure on Azure or a conda conflict between the host machine and the conda built by the SRW App (originally seeing on GCP is now being observed for all PW platforms). This PR resolves the conda conflict by deactivating the host conda before activating the srw_app environment for all PW platforms. The solution for Azure requires configurations changes, which were done on the backend.

@MichaelLueken
Copy link
Collaborator Author

PR #1162 - Update weather model hash to November 21 and UPP hash to November 8. Merged on 12/09/2024:

  • Update UFS-WM hash to 33b3c18 (November 21)
  • Update UPP hash to ce5f3b1 (November 8)
  • Add fix/upp/nam_micro_lookup.dat file
  • Add fix/upp/nam_micro_lookup.dat to scripts/exregional_run_fcst.sh and scripts/exregional_run_post.sh
  • Add Ben Koziol to the .github/CODEOWNERS file

@MichaelLueken
Copy link
Collaborator Author

PR #1157 - Updating ConfigWorkflow.rst file to reflect changes to Config defaults. Merged on 12/10/2024:

Updated ConfigWorkflow.rst to reflect recent changes to config_defaults.yaml in order to keep documentation up to date.

@MichaelLueken
Copy link
Collaborator Author

PR #1167 - Adding missing Intel variable for PW Azure. Merged on 12/11/2024:

The SRW App still fails on the PW Azure instance. It appears that the compute node needs to be in the same zone as the controller node. To achieve this, the compute node instance type needs to change, which is failing because of a missing Intel variable. This update adds this missing Intel variable when running on Azure.

@MichaelLueken
Copy link
Collaborator Author

PR #1169 - Correct documentation issue and close out two issues. Merged on 01/07/2025:

The documentation has been failing for the last week due to the https://www.ncep.noaa.gov/ website being unresponsive. This site URL has been replaced with https://www.weather.gov/ncep/.

The following two issues have also been addressed in this PR:

@MichaelLueken
Copy link
Collaborator Author

PR #1163 - Port SRW to Gaea C6. Merged on 01/13/2025:

Enable SRW for new machine Gaea-C6.
UFS_SRW_data copied to /gpfs/f6/bil-fire8/world-shared/UFS_SRW_data location.

@MichaelLueken
Copy link
Collaborator Author

PR #1178 - Update ufs-weather-model to 5324d64. Merged on 01/22/2025:

@MichaelLueken
Copy link
Collaborator Author

PR #1182 - Enable two-way UFS_FIRE coupling. Merged on 01/24/2025:

Update UFS_FIRE to allow for two-way (atm --> fire + fire --> atm) coupling/feedback. Previously this setting FIRE_ATM_FEEDBACK was only set to 0, but now setting to a value greater than 0 will enable two-way feedback between the fire and the atmosphere.

Added additional documentation for this feature, and updated one of the WE2E tests to use this setting (renamed UFS_FIRE_multifire_one-way-coupled --> UFS_FIRE_multifire_two-way-coupled)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant