From 25204fa121a9a3fcad6d4af565975c50d48f2364 Mon Sep 17 00:00:00 2001 From: tommelt Date: Tue, 5 Nov 2024 08:28:05 +0000 Subject: [PATCH] docs: update installation instructions for CAM with FTorch (#19) * docs: update installation instructions for CAM with FTorch * docs: update FTorch build instructions in README.md Co-authored-by: Jack Atkinson <109271713+jatkinson1000@users.noreply.github.com> * docs: fix typo in README.md Co-authored-by: Jack Atkinson <109271713+jatkinson1000@users.noreply.github.com> * docs: better explanation of `DOUT_S` xml parameter Co-authored-by: Jack Atkinson <109271713+jatkinson1000@users.noreply.github.com> * docs: correct explanation of setting `DOUT_S=FALSE` Co-authored-by: Jack Atkinson <109271713+jatkinson1000@users.noreply.github.com> * docs: better description of CAM version specifics Co-authored-by: Jack Atkinson <109271713+jatkinson1000@users.noreply.github.com> * docs: remove unnecessary info about libtorch module * docs: add more description to the NN configuration --------- Co-authored-by: Jack Atkinson <109271713+jatkinson1000@users.noreply.github.com> --- README.md | 164 ++++++++++++++++++++++++++++++++---------------------- 1 file changed, 98 insertions(+), 66 deletions(-) diff --git a/README.md b/README.md index 5e0d00bac9..adc5a19f18 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,6 @@ developed as part of the DataWave project. The main working branch for this project is `datawave_ml` which was originally branched from the `cam6_3_139` tag. - ## Using this model ### Obtaining CAM @@ -16,40 +15,39 @@ branch on which this work is based: ``` git clone https://github.com/DataWaveProject/CAM.git cd CAM -git checkout datawave_ml +git checkout coupling ``` This branch is built upon the `cam6_3_139` tag from the [main ESMCOMP/CAM repository](https://github.com/ESCOMP/CAM). -### Preparing FTorch and setting the CIME component in CESM +### Obtaining `FTorch` -We also need to build and link _FTorch_ which will allow us to use our PyTorch-based -neural nets in CAM. This has two steps. +To use PyTorch-based neural nets in CAM, we use [`FTorch`](https://github.com/Cambridge-ICCS/FTorch) which needs to be built on the system before we build CAM. -#### Building FTorch +To install `FTorch` on Derecho follow the instructions in section [`FTorch` on Derecho](#ftorch-on-derecho) below. -You need to build and install _FTorch_ locally on the system following the instructions -[in the documentation](https://github.com/Cambridge-ICCS/FTorch). -Note the location of the install as this will be required later when building CAM. +> [!NOTE] +> The location of the `FTorch` install will be required later when building CAM. -For specifics on building FTorch on Derecho to be compatible with CAM see the section -[_FTorch_ on Derecho](#ftorch-on-derecho) below. +> [!NOTE] +> If you want to build `FTorch` on another system, please follow the general instructions in the `FTorch` +> [documentation](https://github.com/Cambridge-ICCS/FTorch). +### Obtaining `FTorch`-compatible CIME -#### Obtaining _FTorch_-compatible CIME +This fork of CAM will use an `FTorch`-compatible version of the CIME buildsystem. This is specified under the `[cime]` section +of the `Externals.cfg` file in the main CAM directory. -This fork of CAM will use an FTorch-compatible version of the CIME buildsystem. -This is specified under the `[cime]` section of the `Externals.cfg` file in the -main CAM directory. For more information see -[_FTorch_-compatible CIME](#ftorch-compatible-cime) below. +To set up a CIME to use `FTorch` please follow the instructions in section [`FTorch`-compatible CIME](#ftorch-compatible-cime) +below. ### Checkout externals -You can now run, from within the CAM root directory, +To fetch the external components run the following command from within the CAM root directory, + ``` ./manage_externals/checkout_externals ``` -to fetch the external components. ### Creating and running a case @@ -63,43 +61,70 @@ from `CAM/cime/scripts/`. You can then navigate to the case directory at ``. -#### Building with _FTorch_ +### Build CAM with `FTorch` -To couple to FTorch modify `Tools/Makefile` line 600 to set the environment variable -FTORCH_LIB to the location of the FTorch library on your system. +Before we can run `./case.build` we first need to make some manual changes to the `Makefile` located in +`/Tools/Makefile`. This will allow the CIME build system to locate `FTorch`. -On Derecho this is `/path/to/ftorch/bin/ftorch_intel` as defined below in -[_FTorch_ on Derecho](#ftorch-on-derecho). +From the test case directory, modify `Tools/Makefile` line 602 to set the environment variable `FTORCH_LIB` to the location of +the `FTorch` library on your system. -#### Setting up case details +If you followed the instructions in section [`FTorch` on Derecho](#ftorch-on-derecho) this will be `$HOME/FTorch/bin/ftorch_intel`. -We can now run `./case.setup` from within the case directory. -Once this has been done then edit the generated `user_nl_cam` in the case directory -as required. -Add the following lines: +```make +FTORCH_LIB := $HOME/FTorch/bin/ftorch_intel +``` -#. `gw_convect_dp_ml='on'`\ - This is the switch to use our new ML convective-gw scheme instead of the default.\ - Other options are `'off'` (default - use original), `'bothoff'` (run both schemes - but use default for simulation), and `'bothon'` (run both but use ML for simulation). -#. `gw_convect_dp_ml_net=''`\ - The path to the your saved PyTorch model. +### Setting up case details -Also consider adding: -``` -fincl = 'MYVAR' +We can now run `./case.setup` from within the case directory. Once this has been done then edit the generated `user_nl_cam` in +the case directory as required. + +> [!NOTE] +> The following settings are provided as an example. These should be tailored to your particular experiment. For more +> information, please see the descriptions below. + +To run CAM using the NN to predict gravity waves and the physics-based model to _piggyback_, we can set the following settings: +```fortran +gw_convect_dp_ml=.true. +gw_convect_dp_ml_compare=.true. +gw_convect_dp_ml_net_path='/path/to/neural/net' +gw_convect_dp_ml_norms='/path/to/norms' ``` -to generate output diagnostics of variables as desired. + +* `gw_convect_dp_ml` (`logical`) + + Whether or not to use the ML scheme for gravity waves produced by deep convection. Default: `.false.` + +* `gw_convect_dp_ml_compare` (`logical`) + + Whether or not to run a piggybacking comparison of the ML deep convection gravity waves to the original scheme. Only one + scheme will be used to advance the simulation as dictated by `gw_convect_deep_ml`. Default: `.false.` + +* `gw_convect_dp_ml_net_path` + + Absolute filepath to the deep convection gravity wave neural net used when `gw_convect_dp_ml` is set to `.true.` (`.pt` + extension). + +* `gw_convect_dp_ml_norms` + + Absolute filepath to the deep convection gravity wave normalisation weights (NetCDF) used when `gw_convect_dp_ml` is set to `.true.`. + +> [!TIP] +> Consider adding the following to generate output diagnostics of variables as desired. +> +> ```fortran +> fincl = 'MYVAR' +> ``` We can then run `./case.build` from within the case directory to build the model. The case can be run with `./case.submit` from the case directory. -**Note:**\ -By default CESM will place output in `/glade/scratch/user/case/` -and logs/restart files in `/glade/scratch/user/archive/case/`. -To place all output with logs in `archive/case` switch 'short term archiving' on by -editing `env_run.xml` in the case directory to change `DOUT_S` from `FALSE` to `TRUE`. +> [!NOTE] +> By default CESM will place outputs in `$SCRATCH/case/` essential parts of which will be moved to `$SCRATCH/archive/case/` after run completion. +> To leave all output in `$SCRATCH/case/` switch 'short term archiving' off by running `./xmlchange DOUT_S=FALSE` in the +> case directory to change `DOUT_S` from `TRUE` to `FALSE`. ## NOTE: This is **unsupported** development code and is subject to the [CESM developer's agreement](http://www.cgd.ucar.edu/cseg/development-code.html). @@ -109,37 +134,43 @@ editing `env_run.xml` in the case directory to change `DOUT_S` from `FALSE` to ` Please see the [wiki](https://github.com/ESCOMP/CAM/wiki) for complete documentation on CAM, getting started with git and how to contribute to CAM's development. -### _FTorch_ on Derecho +### `FTorch` on Derecho -The following steps can be followed to ensure FTorch is built to be consistent +The following steps can be followed to ensure `FTorch` is built to be consistent with CAM on Derecho. -On Derecho `libtorch` should be loaded using -``` -module load libtorch/2.1.2 -``` -and used to build _FTorch_.\ +#### load CAM environment -Further, for compatibility with CAM we need to be specific about the environment and -compilers we load. -The following sequence of modules are required to build FTorch compatible with the intel -build of CAM on Derecho: -``` +For compatibility with the version of CAM we are using (branched from the `cam6_3_139` tag) we need to be specific about the environment and compilers we load. The following sequence of modules +are required to build `FTorch` compatible with the intel build of CAM on Derecho: + +```bash module purge module load ncarenv/23.06 module load intel-oneapi/2023.0.0 module load mkl module load cmake -module load libtorch/2.1.2 module load cuda/11.7.1 ``` -Note that in future builds or releases, or on different machines, the environment for -building CAM may change. -In this case the FTorch environment should be updated accordingly. -FTorch can then be built and installed from `/path/to/ftorch/src/build/` as described in the -documentation with: +> [!NOTE] +> In future builds or releases, or on different machines, the environment for building CAM may change. In this case the `FTorch` +> environment should be updated accordingly. + +#### obtain Ftorch source + +```bash +cd $HOME +git clone git@github.com:Cambridge-ICCS/FTorch.git +cd $HOME/FTorch/src ``` + +#### build Ftorch + +`FTorch` can then be built and installed from `$HOME/Ftorch/src` as described in the documentation with: + +```bash +mkdir -p build && cd build cmake .. \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_Fortran_COMPILER=ifort \ @@ -147,19 +178,19 @@ cmake .. \ -DCMAKE_CXX_COMPILER=icpc \ -DCMAKE_PREFIX_PATH=/glade/u/apps/opt/libtorch/2.1.2 \ -DCMAKE_INSTALL_PREFIX=../../bin/ftorch_intel/ - cmake --build . --target install ``` -This will build FTorch and install it to `/path/to/ftorch/bin/ftorch_intel`. +This will build `FTorch` and install it to `$HOME/Ftorch/bin/ftorch_intel`. -### _FTorch_-compatible CIME +### `FTorch`-compatible CIME We need to use a version of the CIME build system that is capable of linking -our code to FTorch when building CAM. +our code to `FTorch` when building CAM. To do this we have modified the `Externals.cfg` file in the main CAM directory to replace the CIME entry with: + ``` [cime] branch = ftorch_gw @@ -168,8 +199,9 @@ repo_url = https://github.com/Cambridge-ICCS/cime_je local_path = cime required = True ``` + which points to the [ICCS fork](https://github.com/Cambridge-ICCS/cime_je) of CIME -that allows components to be built with FTorch. +that allows components to be built with `FTorch`. Specifically it points to a branch based off of the `cime6.0.175` tag that is compatible with the latest version of CIME used with this version of CAM (this is the cime tag