Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update installation instructions for CAM with FTorch #19

Merged
164 changes: 98 additions & 66 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ developed as part of the DataWave project.
The main working branch for this project is `datawave_ml` which was originally branched
from the `cam6_3_139` tag.


## Using this model

### Obtaining CAM
Expand All @@ -16,40 +15,39 @@ branch on which this work is based:
```
git clone https://github.com/DataWaveProject/CAM.git
cd CAM
git checkout datawave_ml
git checkout coupling
jatkinson1000 marked this conversation as resolved.
Show resolved Hide resolved
```
This branch is built upon the `cam6_3_139` tag from the
[main ESMCOMP/CAM repository](https://github.com/ESCOMP/CAM).

### Preparing FTorch and setting the CIME component in CESM
### Obtaining `FTorch`

We also need to build and link _FTorch_ which will allow us to use our PyTorch-based
neural nets in CAM. This has two steps.
To use PyTorch-based neural nets in CAM, we first need to build and link `FTorch`.
TomMelt marked this conversation as resolved.
Show resolved Hide resolved

#### Building FTorch
To install `FTorch` on Derecho follow the instructions in section [`FTorch` on Derecho](#ftorch-on-derecho) below.
jatkinson1000 marked this conversation as resolved.
Show resolved Hide resolved

You need to build and install _FTorch_ locally on the system following the instructions
[in the documentation](https://github.com/Cambridge-ICCS/FTorch).
Note the location of the install as this will be required later when building CAM.
> [!NOTE]
> The location of the `FTorch` install will be required later when building CAM.

For specifics on building FTorch on Derecho to be compatible with CAM see the section
[_FTorch_ on Derecho](#ftorch-on-derecho) below.
> [!NOTE]
> If you want to build `FTorch` on another system, please follow the general instructions in the `FTorch`
> [documentation](https://github.com/Cambridge-ICCS/FTorch).

### Obtaining `FTorch`-compatible CIME

#### Obtaining _FTorch_-compatible CIME
This fork of CAM will use an `FTorch`-compatible version of the CIME buildsystem. This is specified under the `[cime]` section
of the `Externals.cfg` file in the main CAM directory.

This fork of CAM will use an FTorch-compatible version of the CIME buildsystem.
This is specified under the `[cime]` section of the `Externals.cfg` file in the
main CAM directory. For more information see
[_FTorch_-compatible CIME](#ftorch-compatible-cime) below.
To set up a CIME to use `FTorch` please follow the instructions in section [`FTorch`-compatible CIME](#ftorch-compatible-cime)
below.

### Checkout externals

You can now run, from within the CAM root directory,
To fetch the external components run the following command from within the CAM root directory,

```
./manage_externals/checkout_externals
```
to fetch the external components.

### Creating and running a case

Expand All @@ -63,43 +61,65 @@ from `CAM/cime/scripts/`.

You can then navigate to the case directory at `<path_to_testcase_directory>`.

#### Building with _FTorch_
### Build CAM with `FTorch`

To couple to FTorch modify `Tools/Makefile` line 600 to set the environment variable
FTORCH_LIB to the location of the FTorch library on your system.
Before we can run `./case.build` we first need to make some manual changes to the `Makefile` located in
`path_to_testcase_directory>/Tools/Makefile`. This will allow the CIME build system to locate `FTorch`.
TomMelt marked this conversation as resolved.
Show resolved Hide resolved

On Derecho this is `/path/to/ftorch/bin/ftorch_intel` as defined below in
[_FTorch_ on Derecho](#ftorch-on-derecho).
From the test case directory, modify `Tools/Makefile` line 602 to set the environment variable `FTORCH_LIB` to the location of
the `FTorch` library on your system.

#### Setting up case details
If you followed the instructions in section [`FTorch` on Derecho](#ftorch-on-derecho) this will be `$HOME/FTorch/bin/ftorch_intel`.

We can now run `./case.setup` from within the case directory.
Once this has been done then edit the generated `user_nl_cam` in the case directory
as required.
Add the following lines:
```make
FTORCH_LIB := $HOME/FTorch/bin/ftorch_intel
```

#. `gw_convect_dp_ml='on'`\
This is the switch to use our new ML convective-gw scheme instead of the default.\
Other options are `'off'` (default - use original), `'bothoff'` (run both schemes
but use default for simulation), and `'bothon'` (run both but use ML for simulation).
#. `gw_convect_dp_ml_net='<PATH/TO/MODEL.pt>'`\
The path to the your saved PyTorch model.
### Setting up case details

Also consider adding:
```
fincl<n> = 'MYVAR'
We can now run `./case.setup` from within the case directory. Once this has been done then edit the generated `user_nl_cam` in
the case directory as required. Add the following lines:

```fortran
gw_convect_dp_ml=.false.
gw_convect_dp_ml_compare=.true.
gw_convect_dp_ml_net_path='/path/to/neural/net'
gw_convect_dp_ml_norms='/path/to/norms'
```
jatkinson1000 marked this conversation as resolved.
Show resolved Hide resolved
to generate output diagnostics of variables as desired.

* `gw_convect_dp_ml` (`logical`)

Whether or not to use the ML scheme for gravity waves produced by deep convection. Default: `.false.`

* `gw_convect_dp_ml_compare` (`logical`)

Whether or not to run a piggybacking comparison of the ML deep convection gravity waves to the original scheme. Only one
scheme will be used to advance the simulation as dictated by `gw_convect_deep_ml`. Default: `.false.`

* `gw_convect_dp_ml_net_path`

Absolute filepath to the deep convection gravity wave neural net used when `gw_convect_dp_ml` is set to `.true.` (`.pt`
extension).

* `gw_convect_dp_ml_norms`

Absolute filepath to the deep convection gravity wave normalisation weights (NetCDF) used when `gw_convect_dp_ml` is set to `.true.`.

> [!TIP]
> Consider adding the following to generate output diagnostics of variables as desired.
>
> ```fortran
> fincl<n> = 'MYVAR'
> ```

We can then run `./case.build` from within the case directory to build the model.

The case can be run with `./case.submit` from the case directory.

**Note:**\
By default CESM will place output in `/glade/scratch/user/case/`
and logs/restart files in `/glade/scratch/user/archive/case/`.
To place all output with logs in `archive/case` switch 'short term archiving' on by
editing `env_run.xml` in the case directory to change `DOUT_S` from `FALSE` to `TRUE`.
> [!NOTE]
> By default CESM will place output in `$SCRATCH/case/` and logs/restart files in `$SCRATCH/archive/case/`.
TomMelt marked this conversation as resolved.
Show resolved Hide resolved
> To place all output with logs in `archive/case` switch 'short term archiving' on by running `./xmlchange DOUT_S=FALSE` in the
TomMelt marked this conversation as resolved.
Show resolved Hide resolved
> case directory to change `DOUT_S` from `TRUE` to `FALSE`.

## NOTE: This is **unsupported** development code and is subject to the [CESM developer's agreement](http://www.cgd.ucar.edu/cseg/development-code.html).

Expand All @@ -109,57 +129,68 @@ editing `env_run.xml` in the case directory to change `DOUT_S` from `FALSE` to `

Please see the [wiki](https://github.com/ESCOMP/CAM/wiki) for complete documentation on CAM, getting started with git and how to contribute to CAM's development.

### _FTorch_ on Derecho
### `FTorch` on Derecho

The following steps can be followed to ensure FTorch is built to be consistent
The following steps can be followed to ensure `FTorch` is built to be consistent
with CAM on Derecho.

On Derecho `libtorch` should be loaded using
```
module load libtorch/2.1.2
```
and used to build _FTorch_.\
#### load CAM environment

Further, for compatibility with CAM we need to be specific about the environment and
compilers we load.
The following sequence of modules are required to build FTorch compatible with the intel
build of CAM on Derecho:
```
For compatibility with CAM we need to be specific about the environment and compilers we load. The following sequence of modules
TomMelt marked this conversation as resolved.
Show resolved Hide resolved
are required to build `FTorch` compatible with the intel build of CAM on Derecho:

```bash
module purge
module load ncarenv/23.06
module load intel-oneapi/2023.0.0
module load mkl
module load cmake
module load libtorch/2.1.2
module load cuda/11.7.1
```
Note that in future builds or releases, or on different machines, the environment for
building CAM may change.
In this case the FTorch environment should be updated accordingly.

FTorch can then be built and installed from `/path/to/ftorch/src/build/` as described in the
documentation with:
> [!NOTE]
> In future builds or releases, or on different machines, the environment for building CAM may change. In this case the `FTorch`
> environment should be updated accordingly.

> [!NOTE]
> When building `FTorch` with `cmake` in the next step, we use the absolute path to `libtorch`. Previously, we loaded the
> `libtorch` module i.e., `module load libtorch/2.1.2` but this conflicts with `ncarenv/23.06`. We do not actually need to load
> the `libtorch` module to use the absolute path, so for now we can ignore this.

TomMelt marked this conversation as resolved.
Show resolved Hide resolved
#### obtain Ftorch source

```bash
cd $HOME
git clone [email protected]:Cambridge-ICCS/FTorch.git
cd $HOME/FTorch/src
```

#### build Ftorch

`FTorch` can then be built and installed from `$HOME/Ftorch/src` as described in the documentation with:

```bash
mkdir -p build && cd build
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_Fortran_COMPILER=ifort \
-DCMAKE_C_COMPILER=icc \
-DCMAKE_CXX_COMPILER=icpc \
-DCMAKE_PREFIX_PATH=/glade/u/apps/opt/libtorch/2.1.2 \
-DCMAKE_INSTALL_PREFIX=../../bin/ftorch_intel/

cmake --build . --target install
```

This will build FTorch and install it to `/path/to/ftorch/bin/ftorch_intel`.
This will build `FTorch` and install it to `$HOME/Ftorch/bin/ftorch_intel`.

### _FTorch_-compatible CIME
### `FTorch`-compatible CIME

We need to use a version of the CIME build system that is capable of linking
our code to FTorch when building CAM.
our code to `FTorch` when building CAM.

To do this we have modified the `Externals.cfg` file in the main CAM directory to
replace the CIME entry with:

```
[cime]
branch = ftorch_gw
Expand All @@ -168,8 +199,9 @@ repo_url = https://github.com/Cambridge-ICCS/cime_je
local_path = cime
required = True
```

which points to the [ICCS fork](https://github.com/Cambridge-ICCS/cime_je) of CIME
that allows components to be built with FTorch.
that allows components to be built with `FTorch`.

Specifically it points to a branch based off of the `cime6.0.175` tag that is compatible
with the latest version of CIME used with this version of CAM (this is the cime tag
Expand Down