Skip to content

ACCESS-Community-Hub/APP4

Repository files navigation

APP4-1

This is the ACCESS Post-Processor (APP), version 4.1

Initially created by Peter Uhe for CMIP5, and further developed for CMIP6-era by Chloe Mackallah.
CSIRO, O&A Aspendale.

This repository was iniated from the orginal (and no-longer active) NCI GitLab repo https://git.nci.org.au/cm2704/APP4 on 9 March, 2023.
For the authoritative version of this code as used for the post-production of ACCESS datasets for CMIP6 (stored in fs38 on NCI's Gadi), see https://doi.org/10.5281/zenodo.7703469.
CSIRO DAP record: http://hdl.handle.net/102.100.100/437645?index=1


The APP4 is a CMORisation tool designed to convert ACCESS model output to ESGF-compliant formats, primarily for publication to CMIP6. The code was originally built for CMIP5, and was further developed for CMIP6-era activities.
Uses CMOR3 and files created with the CMIP6 data request to generate CF-compliant files according to the CMIP6 data standards. The APP4 runs in a Python2.7 environment.

Supported versions are CM2 (coupled, amip & chem versions), ESM1.5 (script & payu versions), OM2[-025]. For use on NCI's Gadi system only. Designed for use on ACCESS model output that has been archived using the ACCESS Archiver tool.

Toubleshooting

Many users cannot immediately load the necessary conda environment that APP4 uses ('cmip6-publication'). The NCI project hh5 must be joined (https://my.nci.org.au/mancini/project/hh5), and the following file created in your home directory:

Filename: ~/.condarc
Contents:

auto_activate_base: false
envs_dirs:
  - /g/data/hh5/public/apps/miniconda3/envs

Custom Mode

In custom mode, the APP4 can process non-CMIP6 experiments, allowing the user to create CMORise using custom metadata (experiment ids, MIP names, etc) rather than requiring the use of the CMIP6 Controlled Vocabulary. However, only CMIP6 variables can be generated (CCMI2022 also included for CM2-Chem model).

custom_app4.sh
This is main control script for the APP4 in custom mode. Once all variables have been set, simply run the script with ./custom_app4.sh to create the necessary files (job script, variable maps, etc) and submit the task to the job queue.
Here you define:

  • Details about the experiment you wish to process, including the location of the archived data (see https://git.nci.org.au/cm2704/ACCESS-Archiver) and version of ACCESS.
  • Metadata intended for the final datasets, including experiment and MIP names, ensemble number and branch times, for both the present experiment, and its parent experiment (if applicable; 'no parent' can also be used by setting parent to false).
  • The variable(s) to process and generate. These can be set either by declaring a MIP Table (e.g., Amon, SIday, etc.) and variable (CMIP6 names) in the wrapper, or by creating a file-based list of variables (VAR_SUBSET_LIST) to read in.
    The CMIP6 data request file to use for variable definitions is also defined here. A default file that contains all APP4-ready CMIP6 variables can also be selected.
  • NCI job information, including the intended write location of CMORised data and job files, and declaring job details (compute project, job queue (hugemem is recommended), and cpu/memory usage).

check_app4.sh
This script can be used to perform a simple sweep on your CMORised data to see which variables succeeded, and if any failed to generate (or only completed for some years). It does not check the data itself, but scans through the job output files to provide a summary of what was generated by the job.
You can set the local experiment name and output data location manually, or have the script read these from custom_app4.sh (by setting READ_FROM_CUSTOM_WRAPPER=true).

Further job logs and information can be found in the job output file ($OUTPUT_LOC/APP_job_files/${EXP_TO_PROCESS}/job_output.OU) and in the variable logs ($OUTPUT_LOC/APP_job_files/${EXP_TO_PROCESS}/variable_logs). Lists of successful and failed variables are stored in ($OUTPUT_LOC/APP_job_files/${EXP_TO_PROCESS}/success_lists/).
The CMOR logs ($OUTPUT_LOC/APP_job_files/cmor_logs) are overwritten by CMOR everytime it generates a file, and so for CMOR-specific log information you must run each problematic variable at time (variable/table declared in custom_app4.sh).

Production Mode

In production mode, the standard CMIP6 Controlled Vocabulary must be used as this mode is intended to generate data for publication to CMIP6 or related activities (e.g. CCMI2022).

Details of each experiment must be included in the Experiments table (input_files/experiments.csv), and a experiment-specific metadata json file created (see input_files/json; this is done automatically in custom mode) for each experiment and ensemble member. This is to ensure the reliable recreation of data and a self-contained record of officially-produced datasets.

multiwrap_app4.sh
A simple wrapper that can be used for batches of simulations. It will overwrite the declared EXP_TO_PROCESS in production_app4.sh (the main APP4 control script) and production_qc4.sh (the QC tool), and can also call the simple checker check_app4.sh for job summaries.

production_app4.sh
The main control file for the APP4 in production mode. It can used to processed a single experiment (declare EXP_TO_PROCESS here), or called with the script multiwrap_app4.sh. The main information that is declared in this script is:

  • The variable(s) to process and generate. These can be set either by declaring a MIP Table (e.g., Amon, SIday, etc.) and variable (CMIP6 names), or by creating a file-based list of variables (VAR_SUBSET_LIST) to read in. You can also contol the processing of subdaily data with the SUBDAILY flag, which can be set to 'true', 'false' and 'only' (thus ignoring non-subdaily variables).
    The default mode is such that TABLE_TO_PROCESS and VARIABLE_TO_PROCESS are set to 'all', VAR_SUBSET is false, and SUBDAILY is true. This tells the APP to process all variables in the data request file (which can be overwritten with FORCE_DREQ to include every CMIP6 variable that has set up for the APP4).
  • NCI job information, including the intended write location of CMORised data and job files, and declaring job details (compute project, job queue (hugemem is recommended), and cpu/memory usage).
  • The mode variable can be used to process data for official CMIP6-related activities; currently only CCMI2022 is setup in the APP.

production_qc4.sh
The controller for APP's quality checking (QC) tool. It is used to perform ESFG compliance checks on the data using CMOR's PrePARE, and can automatically create default plots to aid in manual data QC (location of plots set by QC_DIR). The plots are also setup in a web-viewable directory in p66, set using ONLINE_PLOT_DIR and viewable at accessdev.nci.org.au/p66.
Note: this tool does not automatically check the actual data in any way).
The compliance step will move the data from the APP's output location (defined in the metadata json files) to a secondary location defined by the variable PUB_DIR, however this process can be switched off with PUBLISH=false.
Other controls in this script are as in production_app4.sh.

Input files

input_files/experiments.csv
Details of each simulation are delared in this file, including local experiment name, location of model output, the metadata json file specific to the simulation, the data request file (generated using dreqPy) to be used, the processing start and end years, the reference year (used in the netCDF attribute time_units), and the version of ACCESS (CM2, CM2-Chem, ESM, OM2, OM2-025).

input_files/json/${EXP_TO_PROCESS}.json
These json files are built from CMIP6-defined templates, and are ready by CMOR. They define the global metadata that is written into each generated dataset, including CMIP6 experiment and MIP names, parent experiment details, ensemble member values, branching times, model information, and the location of the output for the final datasets.
Templates for each model are also included here (input_files/json/default_[cm2,esm,om2,om2-025,cm2-chem]).

input_files/master_map.csv
This is the heart of the APP's ability to create CMIP6-compliant variables from ACCESS model output. For each CMIP6 variable, it defines the field as it exists in ACCESS model output files (assuming that the model output has been archived using the ACCESS Archiver), calculations required (either through simple arithmetic equations, or by calling fuctions from subroutines/app_functions.py), units of the input data, and flags such as dimension adjustments.
There are two variations of this file, input_files/master_map_om2.csv (for the OM2 models) and input_files/master_map_ccmi2022.csv (for the CM2-Chem model).

Subroutines

subroutines/setup_env.sh
Declares several inputs and outputs used in the APP4, and is the first subroutine run.
It includes the CMIP6 controlled vocabulary files for CMOR, the experiments table and master variable map, ancillary files for ACCESS (such as land-sea masks), and the format and naming of job logs and temporary files ('APP_job_files', declared with OUT_DIR and related variables).
The Python Conda environment built for the APP4 is also activated in this file, along with the default contact email ([email protected]) and some extra control options (such as overwriting existing output data and restricing the years a variable is processed for according to the data request file).

subroutines/cleanup.sh
Performs a simple check of the 'APP_job_files' directory (containing temporary files and job logs), and requests confirmation of deletion. Also cleans the top directory of the APP4 of temporary files.
This is run directly after subroutines/setup_env.sh and prior to all main tasks. Can also be run manually.

subroutines/custom_json_editor.sh
For use in 'custom mode'. Creates a metadata json file (stored in 'APP_job_files') and edits a version of the CMOR tables (controlled vocabulary) to insert the experiment-level metadata defined in custom_app4.sh.
Without this script, CMOR will reject any non-CMIP6 experiment details.

subroutines/dreq_mapping.py
For each variable to be processed, this Python script prepares a detailed map ('APP_job_files/variable_maps') using information from input_files/master_map.csv and the data request file.
Due to the complexity of CMIP6, some specific variable map variations/alterations are hard coded here in the fuction 'special_cases'.

subroutines/database_manager.py
Variable information is further sorted by this Python script, which created an SQL database, in which each row corresponds to a single output file (many variables are split into yearly/decadal chunks).
The file input_files/grids[om2,om2-025].csv is called in this script, which defined how to split each variable/dataset.

subroutines/app_wrapper.py
The SQL database is read and the rows processed in parallel (using Python's multiprocessing package).
Each row is passed to subroutines/app.py, and the results of the task recorded in 'APP_job_files' (variable_logs/, success_lists/).

subroutines/app.py
The main processing Python script of the APP. It extracts data from the model output, performs calculations and adjustments according to the relevant variable map, defines the axes, and CMORises the data.
It is at this stage that CMOR will block any data or metadata that is not contained within the CMIP6 controlled vocabulary (unless in custom mode).

subroutines/app_functions.py
Non-trivial calculations and data transformations are defined in this Python script.
These are referred to in input_files/master_map.csv and called in subroutines/app.py.

subroutines/completion_check.py
A simple job output checking script for easy job summaries; reads logs and temporary files from 'APP_job_files'.
Called by check_app4.sh and production_qc4.sh.

subroutines/quality_check.py
Performs compliance checks (and moves data to the publication directory if passed) and can create automated timeseries and QC plots.
Called by production_qc4.sh.

About

This is the ACCESS Post-Processor (APP)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published