-
Notifications
You must be signed in to change notification settings - Fork 177
Run Global Workflow
Sections:
- Obtain restricted data access
- Clone and build global-workflow
- Prepare initial conditions
- Run setup scripts to generate experiment
- Configure your run further
- Start the run
- Monitor your rocoto-based run
- View experiment output
- Common errors, known issues, and their solutions
This section will provide details on cloning, building, linking, setting up, and running the global-workflow with the ROCOTO workflow manager.
Quick clone/build/link instructions (more detailed instructions below):
> git clone https://github.com/NOAA-EMC/global-workflow.git
> cd global-workflow/sorc
> sh checkout.sh
> sh build_all.sh
> sh link_fv3gfs.sh emc [dell][cray][hera]
https method:
git clone https://github.com/NOAA-EMC/global-workflow.git
ssh method (using a password protected SSH key):
git clone [email protected]:NOAA-EMC/global-workflow.git
Check what you just cloned (by default you will have only the develop branch):
> cd global-workflow
> git branch
* develop
You now have a cloned copy of the global-workflow git repository. To checkout a branch or tag in your clone:
git checkout BRANCH_NAME
Note: Branch must already exist here ^. If it does not you need to make a new branch using the “-b” flag:
git checkout -b BRANCH_NAME
The “checkout” command will checkout BRANCH_NAME and switch your clone to that branch. Example:
> git checkout my_branch ← checkout the ‘my_branch’ branch into clone
> git branch
* my_branch ← now your clone is the ‘my_branch’ branch
develop
Once you have cloned the workflow repository it's time to checkout/clone its components. The components will be checked out under the /sorc folder via a script called checkout.sh. Run the script with no arguments:
> cd sorc
> sh checkout.sh
If wishing to run with the CCPP physics provide the "-c" flag with checkout.sh:
> sh checkout.sh -c
If wishing to run with the operational GTG UPP and WAFS (only for select users) provide the -o flag with checkout.sh:
> sh checkout.sh -o
Each component cloned via checkout.sh will have a log (checkout-COMPONENT.log). Check the screen output and logs for clone errors.
Under the /sorc folder is a script to build all components called build_all.sh. After running checkout.sh run this script to build all components codes:
> sh build_all.sh
If you plan to run with the CCPP physics then provide the -c flag to invoke CCPP build options for the FV3 (you must have run checkout.sh -c or recloned your fv3gfs.fd folder with a ufs-weather-model hash that supports the CCPP):
> sh build_all.sh -c
Alternatively, you can rerun build_fv3.sh after changing your fv3gfs.fd folder contents:
> sh build_fv3.sh YES
(where YES tells it to build with CCPP)
A partial build option is also available via two methods:
a) modify fv3gfs_build.cfg config file to disable/enable particular builds and then rerun build_all.sh
b) run individual build scripts also available in /sorc folder for each component or group of codes
At runtime the global-workflow needs all pieces in place within the main superstructure. To establish this a link script is run to create symlinks from the top level folders down to component files checked out in /sorc folders.
After running the checkout and build scripts run the link script:
sh link_fv3gfs.sh $RUN_ENVIR $MACHINE
...where:
- RUN_ENVIR is either "emc" or "nco". The "nco" option is only used by NCO during installation into production. Users should use the "emc" option otherwise.
- MACHINE is the HPC/platform/machine you're on. Options are: dell, cray, hera
Example:
> sh link_fv3gfs.sh emc hera
There are two types of initial conditions for the global-workflow:
- Warm start: these ICs are taken directly from either the GFS in production or an experiment "warmed" up (at least one cycle in).
- Cold start: any ICs converted to a new resolution or grid (e.g. GSM-GFS -> FV3GFS). These ICs are often prepared by chgres_cube (change resolution utility).
Most users will initiate their experiments with cold start ICs unless running high resolution (C768 deterministic with C384 EnKF) for a date with warm starts available. It is not recommended to run high resolution unless required or as part of final testing.
Resolutions:
- C48 = 2 degree ≈ 200km
- C96 = 1 degree ≈ 100km
- C192 = 1/2 degree ≈ 50km
- C384 = 1/4 degree ≈ 25km
- C768 = 1/8th degree ≈ 13km
- C1152 ≈ 9km
- C3072 ≈ 3km
Fully tested/supported resolutions in global-workflow: C192, C384, C768
The following information is for users needing to generate initial conditions for a cycled experiment that will run at a different resolution or layer amount than the operational GFS (C768C384L64).
The new chgres_cube code is available from the UFS_UTILS repository on GitHub (maintained by George Gayno) and can be used to convert GFS ICs to a different resolution or number of layers. The chgres_cube code/scripts currently support the following GFS inputs:
- pre-GFSv14 (GFS-GSM)
- GFSv14 (GFS-GSM)
- GFSv15 (FV3GFS)
git clone --recursive https://github.com/NOAA-EMC/UFS_UTILS.git
sh build_all.sh
cd fix
sh link_fixdirs.sh emc $MACHINE
...where $MACHINE is "cray", "dell", "hera", or "jet".
cd util/gdas_init
vi config
Read the doc block at the top of the config and adjust the variables to meet you needs (e.g. yy, mm, dd, hh for SDATE).
./driver.$MACHINE.sh
...where $MACHINE is currently "dell" or "cray" or "hera". Additional options will be available as support for other machines expands.
90 small jobs will be submitted:
- 9 jobs to pull inputs off HPSS (1 for deterministic and 8 for the EnKF ensemble members)
- 81 jobs to run chgres (1 for deterministic/hires and 80 for each EnKF ensemble member)
The chgres jobs will have a dependency on the data-pull jobs and will wait to run until all data-pull jobs have completed.
In the config you will have defined an output folder called $OUTDIR. The converted/chgres'd output will be found there, including the needed abias and radstat initial condition files. The files will be in the needed directory structure for the global-workflow system, therefore a user can move the contents of their $OUTDIR directly into their $ROTDIR/$COMROT.
This is a preliminary version of the new chgres_cube code/scripts. Please report bugs to George Gayno ([email protected]) and Kate Friedman ([email protected]).
The GFSv15 was implemented into production on June 12th, 2019 at 12z. The GFS was spun up ahead of that cycle and thus production output for the system is available from the 00z cycle (2019061200) and later. Production output tarballs from the prior GFSv14 system are located in the same location on HPSS but have "hps" in the name to represent that it was run on the Cray, where as the GFS now runs in production on the Dell and has "dell1" in the tarball name.
See production output in the following location on HPSS:
/NCEPPROD/hpssprod/runhistory/rhYYYY/YYYYMM/YYYYMMDD
Example location:
/NCEPPROD/hpssprod/runhistory/rh2019/201907/20190704
Example listing for 2019070400 production tarballs:
[Kate.Friedman@m72a2 ~]$ hpsstar dir /NCEPPROD/hpssprod/runhistory/rh2019/201907/20190704 | grep gfs | grep 20190704_00
[connecting to hpsscore1.fairmont.rdhpcs.noaa.gov/1217]
******************************************************************
* Welcome to the NESCC High Performance Storage System *
* *
* Current HPSS version: 7.4.3 Patch 2 *
* *
* *
* Please Submit Helpdesk Request to *
* [email protected] *
* *
* Announcements: *
******************************************************************
Username: Kate.Friedman UID: 2391 Acct: 2391(2391) Copies: 1 Firewall: off [hsi.5.0.2.p5 Thu Apr 26 13:19:38 UTC 2018]
/NCEPPROD/hpssprod/runhistory/rh2019/201907:
drwxr-xr-x 2 nwprod prod 12800 Jul 10 07:39 20190704
[connecting to hpsscore1.fairmont.rdhpcs.noaa.gov/1217]
-rw-r----- 1 nwprod rstprod 24201632768 Jul 6 10:39 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas.tar
-rw-r--r-- 1 nwprod prod 11040 Jul 6 10:39 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas.tar.idx
-rw-r----- 1 nwprod rstprod 104316883456 Jul 6 15:20 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp1.tar
-rw-r--r-- 1 nwprod prod 246560 Jul 6 15:20 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp1.tar.idx
-rw-r----- 1 nwprod rstprod 104316883456 Jul 6 15:39 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp2.tar
-rw-r--r-- 1 nwprod prod 246560 Jul 6 15:39 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp2.tar.idx
-rw-r----- 1 nwprod rstprod 104316883456 Jul 6 15:57 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp3.tar
-rw-r--r-- 1 nwprod prod 246560 Jul 6 15:57 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp3.tar.idx
-rw-r----- 1 nwprod rstprod 104316883456 Jul 6 16:17 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp4.tar
-rw-r--r-- 1 nwprod prod 246560 Jul 6 16:17 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp4.tar.idx
-rw-r----- 1 nwprod rstprod 104316883456 Jul 6 16:38 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp5.tar
-rw-r--r-- 1 nwprod prod 246560 Jul 6 16:38 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp5.tar.idx
-rw-r----- 1 nwprod rstprod 104316883456 Jul 6 16:58 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp6.tar
-rw-r--r-- 1 nwprod prod 246560 Jul 6 16:58 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp6.tar.idx
-rw-r----- 1 nwprod rstprod 104316883456 Jul 6 17:17 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp7.tar
-rw-r--r-- 1 nwprod prod 246560 Jul 6 17:17 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp7.tar.idx
-rw-r----- 1 nwprod rstprod 104316883456 Jul 6 17:36 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp8.tar
-rw-r--r-- 1 nwprod prod 246560 Jul 6 17:36 gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190704_00.enkfgdas_restart_grp8.tar.idx
-rw-r----- 1 nwprod rstprod 8213389824 Jul 6 04:57 gpfs_dell1_nco_ops_com_gfs_prod_gdas.20190704_00.gdas.tar
-rw-r--r-- 1 nwprod prod 305440 Jul 6 04:57 gpfs_dell1_nco_ops_com_gfs_prod_gdas.20190704_00.gdas.tar.idx
-rw-r--r-- 1 nwprod prod 760274432 Jul 6 04:57 gpfs_dell1_nco_ops_com_gfs_prod_gdas.20190704_00.gdas_flux.tar
-rw-r--r-- 1 nwprod prod 4896 Jul 6 04:57 gpfs_dell1_nco_ops_com_gfs_prod_gdas.20190704_00.gdas_flux.tar.idx
-rw-r--r-- 1 nwprod prod 95334748160 Jul 6 05:22 gpfs_dell1_nco_ops_com_gfs_prod_gdas.20190704_00.gdas_nemsio.tar
-rw-r--r-- 1 nwprod prod 8480 Jul 6 05:22 gpfs_dell1_nco_ops_com_gfs_prod_gdas.20190704_00.gdas_nemsio.tar.idx
-rw-r--r-- 1 nwprod prod 3623646720 Jul 6 04:57 gpfs_dell1_nco_ops_com_gfs_prod_gdas.20190704_00.gdas_pgrb2.tar
-rw-r--r-- 1 nwprod prod 31520 Jul 6 04:57 gpfs_dell1_nco_ops_com_gfs_prod_gdas.20190704_00.gdas_pgrb2.tar.idx
-rw-r----- 1 nwprod rstprod 40406691840 Jul 6 05:04 gpfs_dell1_nco_ops_com_gfs_prod_gdas.20190704_00.gdas_restart.tar
-rw-r--r-- 1 nwprod prod 26400 Jul 6 05:04 gpfs_dell1_nco_ops_com_gfs_prod_gdas.20190704_00.gdas_restart.tar.idx
-rw-r----- 1 nwprod rstprod 21489377280 Jul 6 05:26 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs.tar
-rw-r--r-- 1 nwprod prod 2031392 Jul 6 05:26 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs.tar.idx
-rw-r--r-- 1 nwprod prod 46592740864 Jul 6 05:34 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs_flux.tar
-rw-r--r-- 1 nwprod prod 214816 Jul 6 05:34 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs_flux.tar.idx
-rw-r--r-- 1 nwprod prod 294403269120 Jul 6 07:01 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs_nemsioa.tar
-rw-r--r-- 1 nwprod prod 23328 Jul 6 07:01 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs_nemsioa.tar.idx
-rw-r--r-- 1 nwprod prod 336908471296 Jul 6 08:05 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs_nemsiob.tar
-rw-r--r-- 1 nwprod prod 26912 Jul 6 08:05 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs_nemsiob.tar.idx
-rw-r--r-- 1 nwprod prod 63337960960 Jul 6 05:44 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs_pgrb2.tar
-rw-r--r-- 1 nwprod prod 400672 Jul 6 05:44 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs_pgrb2.tar.idx
-rw-r--r-- 1 nwprod prod 43709473792 Jul 6 05:52 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs_pgrb2b.tar
-rw-r--r-- 1 nwprod prod 400160 Jul 6 05:52 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs_pgrb2b.tar.idx
-rw-r--r-- 1 nwprod prod 12637940736 Jul 6 05:55 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs_restart.tar
-rw-r--r-- 1 nwprod prod 5408 Jul 6 05:55 gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190704_00.gfs_restart.tar.idx
The warm starts and other output from production are at C768 deterministic and C384 EnKF. The warm start files must be converted to your desired resolution(s) using global_chgres if you wish to run a different resolution. If you are running a C768/C384 experiment you can use them as is.
That depends on what mode you want to run...free-forecast or cycled. Whichever mode navigate to the top of your COMROT and pull the entirety of the tarball(s) listed below for your mode. The files within the tarball are already in the $CDUMP.$PDY/$CYC folder format expected by the system.
Two tarballs to pull:
File #1 (for starting cycle SDATE):
/NCEPPROD/hpssprod/runhistory/rhYYYY/YYYYMM/YYYYMMDD/gpfs_dell1_nco_ops_com_gfs_prod_gfs.YYYYMMDD_CC.gfs_restart.tar
File #2 (for prior cycle GDATE=SDATE-06):
/NCEPPROD/hpssprod/runhistory/rhYYYY/YYYYMM/YYYYMMDD/gpfs_dell1_nco_ops_com_gfs_prod_gdas.YYYYMMDD_CC.gdas_restart.tar
There are 18 tarballs to pull (9 for SDATE and 9 for GDATE (SDATE-06)):
HPSS path: /NCEPPROD/hpssprod/runhistory/rhYYYY/YYYYMM/YYYYMMDD/
Tarballs per cycle:
gpfs_dell1_nco_ops_com_gfs_prod_gdas.YYYYMMDD_CC.gdas_restart.tar
gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.YYYYMMDD_CC.enkfgdas_restart_grp1.tar
gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.YYYYMMDD_CC.enkfgdas_restart_grp2.tar
gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.YYYYMMDD_CC.enkfgdas_restart_grp3.tar
gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.YYYYMMDD_CC.enkfgdas_restart_grp4.tar
gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.YYYYMMDD_CC.enkfgdas_restart_grp5.tar
gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.YYYYMMDD_CC.enkfgdas_restart_grp6.tar
gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.YYYYMMDD_CC.enkfgdas_restart_grp7.tar
gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.YYYYMMDD_CC.enkfgdas_restart_grp8.tar
Go to the top of your COMROT/ROTDIR and pull the contents of all tarballs there. The tarballs already contain the needed directory structure.
Example for SDATE 2019090900 using the hpsstar utility:
cd /scratch1/NCEPDEV/stmp4/Joe.Schmo/comrot/mytest
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190909/gpfs_dell1_nco_ops_com_gfs_prod_gdas.20190909_00.gdas_restart.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190909/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190909_00.enkfgdas_restart_grp1.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190909/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190909_00.enkfgdas_restart_grp2.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190909/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190909_00.enkfgdas_restart_grp3.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190909/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190909_00.enkfgdas_restart_grp4.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190909/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190909_00.enkfgdas_restart_grp5.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190909/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190909_00.enkfgdas_restart_grp6.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190909/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190909_00.enkfgdas_restart_grp7.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190909/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190909_00.enkfgdas_restart_grp8.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190908/gpfs_dell1_nco_ops_com_gfs_prod_gdas.20190908_18.gdas_restart.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190908/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190908_18.enkfgdas_restart_grp1.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190908/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190908_18.enkfgdas_restart_grp2.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190908/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190908_18.enkfgdas_restart_grp3.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190908/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190908_18.enkfgdas_restart_grp4.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190908/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190908_18.enkfgdas_restart_grp5.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190908/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190908_18.enkfgdas_restart_grp6.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190908/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190908_18.enkfgdas_restart_grp7.tar
hpsstar get /NCEPPROD/hpssprod/runhistory/rh2019/201909/20190908/gpfs_dell1_nco_ops_com_gfs_prod_enkfgdas.20190908_18.enkfgdas_restart_grp8.tar
Recent pre-implementation parallel series was for GFS v15 (Q2FY19):
- What resolution are warm-starts available for? Warm-start ICs are saved at the resolution the model was run at (C768/C384) and can only be used to run at the same resolution combination. If you need to run a different resolution you will need to make your own cold-start ICs. See cold start section above.
- What dates have warm-start files saved? Unfortunately the frequency changed enough during the runs that it’s not easy to provide a definitive list easily.
- What files? All warm-starts are saved in separate tarballs which include “restart” in the name. You need to pull the entirety of each tarball, all files included in the restart tarballs are needed.
- Where are these tarballs? See below for the location on HPSS for each Q2FY19 pre-implementation parallel.
-
What tarballs do I need to grab for my experiment? Tarballs from two cycles are required. The tarballs are listed below, where $CDATE is your starting cycle and $GDATE is one cycle prior.
- Free-forecast:
- ../$CDATE/gfs_restarta.tar
- ../$GDATE/gdas_restartb.tar
- Cycled w/EnKF:
- ../$CDATE/gdas_restarta.tar
- ../$CDATE/enkfgdas_restarta_grp##.tar (where ## is 01 through 08) (note, older tarballs may include a period between enkf and gdas: "enkf.gdas")
- ../$GDATE/gdas_restartb.tar
- ../$GDATE/enkfgdas_restartb_grp##.tar (where ## is 01 through 08) (note, older tarballs may include a period between enkf and gdas: "enkf.gdas")
- Free-forecast:
- Where do I put the warm-start initial conditions? Extraction should occur right inside your COMROT. You may need to rename the enkf folder (enkf.gdas.$PDY -> enkfgdas.$PDY).
Time Period | Parallel Name | Archive Location on HPSS (/NCEPDEV/emc-global/5year/...) |
---|---|---|
Real-time (05/25/2018 ~ 06/12/2019) |
prfv3rt1 | .../emc.glopara/WCOSS_C/Q2FY19/prfv3rt1 |
2017/2018 Winter/Spring (11/25/2017 ~ 05/31/2018) |
fv3q2fy19retro1 | .../Fanglin.Yang/WCOSS_DELL_P3/Q2FY19/fv3q2fy19retro1 |
2017 Summer/Fall Part 1 (05/25//2017 ~ 08/31/2017) |
fv3q2fy19retro2 | .../emc.glopara/WCOSS_C/Q2FY19/fv3q2fy19retro2 |
2017 Summer/Fall Part 2 (08/02//2017 ~ 11/30/2017) |
fv3q2fy19retro2 | .../Fanglin.Yang/WCOSS_DELL_P3/Q2FY19/fv3q2fy19retro2 |
2016/2017 Winter/Spring (11/25/2016 ~ 05/31/2017) |
fv3q2fy19retro3 | .../Fanglin.Yang/WCOSS_DELL_P3/Q2FY19/fv3q2fy19retro3 |
2016 Summer/Fall Part 1 (5/22/2016 ~ 08/25/2016) |
fv3q2fy19retro4 | .../emc.glopara/WCOSS_C/Q2FY19/fv3q2fy19retro4 |
2016 Summer/Fall Part 2 (08/17//2016 ~ 11/30/2016) |
fv3q2fy19retro4 | .../emc.glopara/WCOSS_DELL_P3/Q2FY19/fv3q2fy19retro4 |
2015/2016 Winter/Spring (11/25/2015 ~ 05/31/2016) |
fv3q2fy19retro5 | .../emc.glopara/WCOSS_DELL_P3/Q2FY19/fv3q2fy19retro5 |
2015 Summer/Fall (5/03/2015 ~ 11/30/2015) |
fv3q2fy19retro6 | .../emc.glopara/WCOSS_DELL_P3/Q2FY19/fv3q2fy19retro6 |
If running with Rocoto make sure to have a Rocoto module loaded before running setup scripts:
module load rocoto
Scripts that will be used:
- ush/rocoto/setup_expt_fcstonly.py
- ush/rocoto/setup_workflow_fcstonly.py
NOTE: The following command examples include variables for reference but users should not use environmental variables to submit the commands. Exporting variables like EXPDIR to your environment causes an error when the python scripts run. Please explicitly include the argument inputs when running both setup scripts.
cd ush/rocoto
./setup_expt_fcstonly.py --pslot $PSLOT --configdir $CONFIGDIR --idate $IDATE --edate $EDATE --res $RES --gfs_cyc $GFS_CYC --comrot $COMROT --expdir $EXPDIR
...where:
- $PSLOT is the name of your experiment
- $CONFIGDIR is the path to the /config folder under the copy of the system you're using (i.e. $PATH_TO_CLONE/parm/config/)
- $IDATE is the initial start date of your run (first cycle CDATE, YYYYMMDDCC)
- $EDATE is the ending date of your run (YYYYMMDDCC) and is the last cycle that will complete
- $RES is the resolution of the forecast (i.e. 768 for C768)
- $GFS_CYC is the forecast frequency (0 = none, 1 = 00z only [default], 2 = 00z & 12z, 4 = all cycles)
- $COMROT is the path to your experiment output directory. DO NOT include PSLOT folder at end of path, it’ll be built for you.
- $EXPDIR is the path to your experiment directory where your configs will be placed and where you will find your workflow monitoring files (i.e. rocoto database and xml file). DO NOT include PSLOT folder at end of path, it will be built for you.
Example:
cd ush/rocoto
./setup_expt_fcstonly.py --pslot test --configdir /home/Joe.Schmo/git/global-workflow/parm/config --idate 2020010100 --edate 2020010118 --res 384 --gfs_cyc 4 --comrot /some_large_disk_area/Joe.Schmo/comrot --expdir /some_safe_disk_area/Joe.Schmo/expdir
Go to your EXPDIR and check/change the following variables within your config.base now before running the next script:
- ACCOUNT
- HOMEDIR
- STMP
- PTMP
- ARCDIR (location on disk for online archive used by verification system)
- HPSSARCH (YES turns on archival)
- HPSS_PROJECT (project on HPSS if archiving)
- ATARDIR (location on HPSS if archiving)
Some of those variables will be found within a machine-specific if-block so make sure to change the correct ones for the machine you'll be running on.
Now is also the time to change any other variables/settings you wish to change in config.base or other configs. Do that now. Once done making changes to the configs in your EXPDIR go back to your clone to run the second setup script.
./setup_workflow_fcstonly.py --expdir $EXPDIR/$PSLOT
Example:
./setup_workflow_fcstonly.py --expdir /some_safe_disk_area/Joe.Schmo/expdir/test
You will now have a rocoto xml file in your EXPDIR ($PSLOT.xml) and a crontab file generated for your use. If you do not have a crontab file you may not have had the rocoto module loaded. To fix this: load a rocoto module and then rerun setup_workflow*.py script again. Cron is handled differently on WCOSS-Dell so follow different instructions for setting up your rocoto cron on Mars/Venus.
Scripts that will be used:
- ush/rocoto/setup_expt.py
- ush/rocoto/setup_workflow.py
NOTE: The following command examples include variables for reference but users should not use environmental variables to submit the commands. Exporting variables like EXPDIR to your environment causes an error when the python scripts run. Please explicitly include the argument inputs when running both setup scripts.
cd ush/rocoto
./setup_expt.py --pslot $PSLOT --configdir $CONFIGDIR --idate $IDATE --edate $EDATE --comrot $COMROT --expdir $EXPDIR [ --icsdir $ICSDIR --resdet $RESDET --resens $RESENS --nens $NENS --gfs_cyc $GFS_CYC ]
Example:
cd ush/rocoto
./setup_expt.py --pslot test --configdir /home/Joe.Schmo/git/global-workflow/parm/config --idate 2020010100 --edate 2020010118 --comrot /some_large_disk_area/Joe.Schmo/comrot --expdir /some_safe_disk_area/Joe.Schmo/expdir --resdet 384 --resens 192 --nens 80 --gfs_cyc 4
...where:
- $PSLOT is the name of your experiment
- $CONFIGDIR is the path to the /config folder under the copy of the system you're using (i.e. $PATH_TO_CLONE/parm/config/)
- $IDATE is the initial start date of your run (first cycle CDATE, YYYYMMDDCC)
- $EDATE is the ending date of your run (YYYYMMDDCC) and is the last cycle that will complete
- $ICSDIR is the path to the ICs for your run if generated separately.
- $COMROT is the path to your experiment output directory. Do not use noscrub space on Cray for COMROT, use ptmp. DO NOT include PSLOT folder at end of path, it’ll be built for you.
- $EXPDIR is the path to your experiment directory where your configs will be placed and where you will find your workflow monitoring files (i.e. rocoto database and xml file). DO NOT include PSLOT folder at end of path, it will be built for you.
- $RESDET is the resolution of the deterministic forecast (i.e. ‘--resdet 768’, optional, default is C384)
- $RESENS is the resolution of the ensemble (EnKF) forecast (i.e. ‘--resens 384’, optional, default is C192)
- $NENS is the number of ensemble members (optional, default is 20)
- $GFS_CYC is the cycle frequency of the long GFS forecast (0 = none, 1 = 00z only [default], 2 = 00z & 12z, 4 = all cycles)
Example setup_expt.py on WCOSS_C:
SURGE-slogin1 > ./setup_expt.py --pslot fv3demo --configdir /gpfs/hps3/emc/global/noscrub/Joe.Schmo/git/global-workflow/parm/config --idate 2017073118 --edate 2017080106 --comrot /gpfs/hps2/ptmp/Joe.Schmo --expdir /gpfs/hps3/emc/global/noscrub/Joe.Schmo/para_gfs
SDATE = 2017-07-31 18:00:00
EDATE = 2017-08-01 06:00:00
EDITED: /gpfs/hps3/emc/global/noscrub/Joe.Schmo/para_gfs/fv3demo/config.base as per user input.
DEFAULT: /gpfs/hps3/emc/global/noscrub/Joe.Schmo/para_gfs/fv3demo/config.base.default is for reference only.
Please verify and delete the default file before proceeding.
SURGE-slogin1 >
The message about the config.base.default is telling you that you are free to delete it if you wish but it’s not necessary to remove. Your resulting config.base was generated from config.base.default and the default one is there for your information.
What happens if I run setup_expt.py again for an experiment that already exists:
SURGE-slogin1 > ./setup_expt.py --pslot fv3demo --configdir /gpfs/hps3/emc/global/noscrub/Joe.Schmo/git/global-workflow/parm/config --idate 2017073118
--edate 2017080106 --comrot /gpfs/hps2/ptmp/Joe.Schmo --expdir /gpfs/hps3/emc/global/noscrub/Joe.Schmo/para_gfs
COMROT already exists in /gpfs/hps2/ptmp/Joe.Schmo/fv3demo
Do you wish to over-write COMROT [y/N]: y
EXPDIR already exists in /gpfs/hps3/emc/global/noscrub/Joe.Schmo/para_gfs/fv3demo
Do you wish to over-write EXPDIR [y/N]: y
SDATE = 2017-07-31 18:00:00
EDATE = 2017-08-01 06:00:00
EDITED: /gpfs/hps3/emc/global/noscrub/Joe.Schmo/para_gfs/fv3demo/config.base as per user input.
DEFAULT: /gpfs/hps3/emc/global/noscrub/Joe.Schmo/para_gfs/fv3demo/config.base.default is for reference only.
Please verify and delete the default file before proceeding.
Your COMROT and EXPDIR will be deleted and remade. Be careful with this!
Go to your EXPDIR and check/change the following variables within your config.base now before running the next script:
- ACCOUNT
- HOMEDIR
- STMP
- PTMP
- ARCDIR (location on disk for online archive used by verification system)
- HPSSARCH (YES turns on archival)
- HPSS_PROJECT (project on HPSS if archiving)
- ATARDIR (location on HPSS if archiving)
Some of those variables will be found within a machine-specific if-block so make sure to change the correct ones for the machine you'll be running on.
Now is also the time to change any other variables/settings you wish to change in config.base or other configs. Do that now. Once done making changes to the configs in your EXPDIR go back to your clone to run the second setup script.
./setup_workflow.py --expdir $EXPDIR/$PSLOT
Example:
./setup_workflow.py --expdir /some_safe_disk_area/Joe.Schmo/expdir/test
You will now have a rocoto xml file in your EXPDIR ($PSLOT.xml) and a crontab file generated for your use. If you do not have a crontab file you may not have had the rocoto module loaded. To fix this: load a rocoto module and then rerun setup_workflow*.py script again. Cron is handled differently on WCOSS-Dell so follow different instructions for setting up your rocoto cron on Mars/Venus.
Set RUN_CCPP=YES in your $EXPDIR config.base before starting your run. No job reconfiguration occurs with the CCPP on.
Further instructions on running with the CCPP can be found here: https://docs.google.com/document/d/13cyw96UJGcb6-grhBVaq8lONo6ARDGhS3BZQUbmuEmM/edit
Make sure a rocoto module is loaded:
module load rocoto
If needed check for available rocoto modules on machine:
module avail rocoto
or
module spider rocoto
rocotorun -d $PSLOT.db -w $PSLOT.xml
The first jobs of your run should now be queued or already running (depending on machine traffic). How exciting!
You'll now have a "logs" folder in both your COMROT and EXPDIR. The EXPDIR log folder contains workflow log files (e.g. rocoto command results) and the COMROT log folder will contain logs for each job (previously known as dayfiles).
crontab -e
or
crontab $PSLOT.crontab
(WARNING: "crontab $PSLOT.crontab" command will overwrite existing crontab file on your login node. If running multiple crons recommend editing crontab file with "crontab -e" command.
Check your crontab setting:
crontab -l
Crontab uses following format:
*/5 * * * * /path/to/rocotorun -w /path/to/workflow/definition/file -d /path/to/workflow/database/file
Go to home cron directory
cd ~/cron
Open admin provided mycrontab file:
vi mycrontab
See provided cron example in initial mycrontab file. It is recommended to set up a script to run your rocotorun commands. That script would be what the mycrontab file would run.
#20 * * * * test -f /gpfs/dell2/ptmp/User.Name/cron/mycronscript-2.ksh && /gpfs/dell2/ptmp/User.Name/cron/mycronscript-2.ksh 1>/gpfs/dell2/ptmp/User.Name/cron/mycronscript-2.log 2>&1
Edit, save, and exit file. New or updated crons will begin the next time the time condition is met.
Click here to view full rocoto documentation on GitHub:
https://github.com/christopherwharrop/rocoto/wiki/documentation
Start or continue a run:
rocotorun -d /path/to/workflow/database/file -w /path/to/workflow/xml/file
Check the status of the workflow:
rocotostat -d /path/to/workflow/database/file -w /path/to/workflow/xml/file [-c YYYYMMDDCCmm,[YYYYMMDDCCmm,...]] [-t taskname,[taskname,...]] [-s] [-T]
Note: YYYYMMDDCCmm = YearMonthDayCycleMinute ...where mm/Minute is ’00’ for all cycles currently.
Check the status of a job:
rocotocheck -d /path/to/workflow/database/file -w /path/to/workflow/xml/file -c YYYYMMDDCCmm -t taskname
Force a task to run (ignores dependencies - USE CAREFULLY!):
rocotoboot -d /path/to/workflow/database/file -w /path/to/workflow/xml/file -c YYYYMMDDCCmm -t taskname
Rerun task(s):
rocotorewind -d /path/to/workflow/database/file -w /path/to/workflow/xml/file -c YYYYMMDDCCmm -t taskname
Set a task to complete (overwrites current state):
rocotocomplete -d /path/to/workflow/database/file -w /path/to/workflow/xml/file -c YYYYMMDDCCmm -t taskname
Several dates and task names may be specified in the same command by adding more -c and -t options. However, lists are not allowed.
A GUI was designed to assist with monitoring rocoto experiments. It can be found under the ush/rocoto folder in global-workflow.
./rocoto_viewer.py -d /path/to/workflow/database/file -w /path/to/workflow/xml/file
Note 1: Terminal/window must be wide enough to display all experiment information columns, viewer will complain if not.
Note 2: The viewer requires the full path to the database and xml files if you are not in your EXPDIR when you invoke it.
https://vlab.ncep.noaa.gov/redmine/attachments/download/24069/rocoto_viewer_example.png
- First column: cycle (YYYYMMDDCCmm, YYYY=year, MM=month, DD=day, CC=cycle hour, mm=minute)
- Second column: task name (a "<" symbol indicates a group/meta-task, click "x" when meta-task is selected to expand/collapse)
- Third column: job ID from scheduler
- Fourth column: job state (QUEUED, RUNNING, SUCCEEDED, FAILED, or DEAD)
- Fifth column: exit code (0 if all ended well)
- Sixth column: number of tries/attempts to run job (0 when not yet run or just rewound, 1 when run once successfully, 2+ for multiple tries up to max try value where job is considered DEAD)
- Seventh column: job duration in seconds
The rocoto viewer accepts both mouse and keyboard inputs. Click “h” for help menu and more options.
Available viewer commands:
c = get information on selected job
r = rewind (rerun) selected job, group, or cycle
R = run rocotorun
b = boot (forcibly run) selected job or group
-> = right arrow key, advance viewer forward to next cycle
<- = left arrow key, advance viewer backward to previous cycle
Q = quit/exit viewer
Advanced features:
- Select multiple tasks at once ** Click “Enter” on a task to select it, click on other tasks or use the up/down arrows to move to other tasks and click “Enter” to select them as well. ** When you next choose “r” for rewinding the pop-up window will now ask if you are sure you want to rewind all those selected tasks.
- Rewind entire group or cycle ** Group - While group/metatask is collapsed (<) click “r” to rewind whole group/metatask. ** Cycle - Use up arrow to move selector up past the first task until the entire left column is highlighted. Click “r” and the entire cycle will be rewound.
The output from your run will be found in the COMROT/ROTDIR you established. This is also where you placed your initial conditions. Within your COMROT you will have the following directory structure:
gfs.YYYYMMDD/CC/ <- contains deterministic long forecast gfs inputs/outputs
logs/ <- logs for each cycle in the run
vrfyarch/ <- contains files related to verification and archival
enkfgdas.YYYYMMDD/CC/mem###/ <- contains EnKF inputs/outputs for each cycle and each member
gdas.YYYYMMDD/CC/ <- contains deterministic gdas inputs/outputs
gfs.YYYYMMDD/CC/ <- contains deterministic long forecast gfs inputs/outputs, available from the first full cycle on depending on chosen gfs long forecast frequency (gfs_cyc)
logs/ <- logs for each cycle in the run
vrfyarch/ <- contains files related to verification and archival
Here is an example COMROT for a cycled run as it may look several days in (note the archival steps remove older cycle folders as the run progresses):
-bash-4.2$ ll /scratch1/NCEPDEV/stmp4/Joe.Schmo/comrot/testcyc192
total 88
drwxr-sr-x 4 Joe.Schmo stmp 4096 Oct 22 04:50 enkfgdas.20190529
drwxr-sr-x 4 Joe.Schmo stmp 4096 Oct 22 07:20 enkfgdas.20190530
drwxr-sr-x 6 Joe.Schmo stmp 4096 Oct 22 03:15 gdas.20190529
drwxr-sr-x 4 Joe.Schmo stmp 4096 Oct 22 07:15 gdas.20190530
drwxr-sr-x 6 Joe.Schmo stmp 4096 Oct 22 03:15 gfs.20190529
drwxr-sr-x 4 Joe.Schmo stmp 4096 Oct 22 07:15 gfs.20190530
drwxr-sr-x 120 Joe.Schmo stmp 12288 Oct 22 07:15 logs
drwxr-sr-x 13 Joe.Schmo stmp 4096 Oct 22 07:07 vrfyarch
- Error: "ImportError" message when running setup script
- Error: curses default colors when running viewer
- Issue: Directory name change for EnKF folder in COMROT
- Error: Git ssh variant setting when running checkout_externals
Example of error:
$ ./setup_workflow.py --expdir /path/to/your/experiment/directory
Traceback (most recent call last):
File "./setup_workflow.py", line 32, in <module>
from collections import OrderedDict
ImportError: cannot import name OrderedDict
Cause: Missing python in your environment
Solution: Load a python module ("module load python") and retry setup script.
Example:
$ ./rocoto_viewer.py -d blah.db -w blah.xml
Traceback (most recent call last):
File "./rocoto_viewer.py", line 2376, in <module>
curses.wrapper(main)
File "/contrib/anaconda/anaconda2/4.4.0/lib/python2.7/curses/wrapper.py", line 43, in wrapper
return func(stdscr, *args, **kwds)
File "./rocoto_viewer.py", line 1202, in main
curses.use_default_colors()
_curses.error: use_default_colors() returned ERR
Cause: wrong TERM setting for curses
Solution: set TERM to "xterm" (bash: export TERM=xterm ; csh/tcsh: setenv TERM xterm)
Issue: The EnKF COMROT folders were renamed during the GFS v15 development process to remove the period between "enkf" and "gdas": enkf.gdas.$PDY → enkfgdas.$PDY
Fix: Older tarballs on HPSS will have the older directory name with the period between 'enkf' and 'gdas'. Make sure to rename folder to 'enkfgdas.$PDY' after obtaining. Only an issue for the initial cycle.
Error seen on WCOSS-Cray (Luna/Surge).
Example:
$ checkout_externals
Processing externals description file : Externals.cfg
Checking status of externals: nemsfv3gfs, emc_post, ufs_utils, gsi, emc_gfs_wafs, emc_verif-global,
Checking out externals: nemsfv3gfs, ERROR:root:Command '[u'git', u'clone', u'--quiet', u'ssh://vlab.ncep.noaa.gov:29418/NEMSfv3gfs', u'fv3gfs.fd']' returned non-zero exit status 128
ERROR:root:Failed with output:
fatal: ssh variant 'simple' does not support setting port
ERROR: In directory
/gpfs/hps3/emc/global/noscrub/Joe.Schmo/git/feature-manage_externals/sorc
Process did not run successfully; returned status 128:
git clone --quiet ssh://vlab.ncep.noaa.gov:29418/NEMSfv3gfs fv3gfs.fd
See above for output from failed command.
Cause: Git ssh variant 'simple' does not support setting port
Solution: Adjust git config ssh setting:
$ git config --global ssh.variant ssh