Skip to content

Commit

Permalink
created nice readme
Browse files Browse the repository at this point in the history
  • Loading branch information
stineb committed Jun 11, 2024
1 parent c42d98c commit 8eba781
Show file tree
Hide file tree
Showing 7 changed files with 238 additions and 74 deletions.
5 changes: 5 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,10 @@ Imports:
parallel,
map2tidy,
rgeco
Suggests:
here,
cowplot,
rnaturalearth,
ggplot2
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
2 changes: 1 addition & 1 deletion R/grsofun.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ grsofun <- function(par, settings){
# Only variables and at aggregation level required for NetCDF output
df <- grsofun_collect(settings, return_data = TRUE)

# Write to NetCDF files
# Write to NetCDF files - XXX not yet implemented
error <- grsofun_save_nc(df, settings)

}
2 changes: 1 addition & 1 deletion R/grsofun_collect.R
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ grsofun_collect_byilon <- function(
if (nrow(ddf) > 0){
# aggregate temporally to mean
# to monthly
vars <- names(settings$save_nc[settings$save_nc == "mon"])
vars <- names(settings$save[settings$save == "mon"])
mdf <- ddf |>
tidyr::unnest(data) |>
dplyr::mutate(
Expand Down
10 changes: 5 additions & 5 deletions R/grsofun_run.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ grsofun_run <- function(par, settings){

if (settings$ncores_max == 1){
# Do not parallelize
# out <- tibble(ilon = 292) |>
out <- tibble(ilon = seq(settings$grid$len_ilon)) |>
# out <- dplyr::tibble(ilon = 292) |>
out <- dplyr::tibble(ilon = seq(settings$grid$len_ilon)) |>
dplyr::mutate(out = purrr::map(
ilon,
~grsofun_run_byilon(
Expand Down Expand Up @@ -44,7 +44,7 @@ grsofun_run <- function(par, settings){

# distribute computation across the cores, calculating for all longitudinal
# indices of this chunk
out <- tibble(ilon = seq(settings$grid$len_ilon)) |>
out <- dplyr::tibble(ilon = seq(settings$grid$len_ilon)) |>
multidplyr::partition(cl) |>
dplyr::mutate(out = purrr::map(
ilon,
Expand Down Expand Up @@ -108,7 +108,7 @@ grsofun_run_bychunk <- function(chunk, nthreads, par, settings){

# distribute computation across the cores, calculating for all longitudinal
# indices of this chunk
out <- tibble(ilon = vec_index) |>
out <- dplyr::tibble(ilon = vec_index) |>
multidplyr::partition(cl) |>
dplyr::mutate(out = purrr::map(
ilon,
Expand Down Expand Up @@ -269,7 +269,7 @@ grsofun_run_byilon <- function(ilon, par, settings){
year_start <- min(lubridate::year(dates))
year_end <- max(lubridate::year(dates))

ddf <- tibble(
ddf <- dplyr::tibble(
date = seq(
from = lubridate::ymd(paste0(year_start, "-01-01")),
to = lubridate::ymd(paste0(year_end, "-12-31")),
Expand Down
200 changes: 173 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,180 @@
# Apply CWD algorithm globally
# grsofun: Global rsofun runs

*Author: Benjamin Stocker*
## Approach

## Workflow

### Prepare forcing

`analysis/make_tidy_<DATAPRODUCT>.R`

{rsofun} requires model forcing data as *tidy* data frames. The first step is to make climate and EO forcing data, commonly provided as NetCDF, tidy. That is, data is saved as nested data frames, where each row is one grid cell and time series is nested in the column `data`.


2. `analysis/apply_cwd_global.R`: Script for parallel applying the CWD algorithm (or anything else that operates on full time series) separately for on gridcell. Distributes by longitudinal band. Reading files written in previous step and writing files (in the example):
The {grsofun} package provides functions that wrap a call to `runread_pmodel_f()` from the {rsofun} package. This call is a point-scale simulation of the P-model. Functions of {grsofun} enable () simulations on each gridcell of a spatial grid and a parallelisation by gridcells for high computational speed.

`~/data/cmip6-ng/tidy/cwd/<fileprefix>_<ilon>.rds`
The implementation follows a *tidy* data paradigm. All model input and output are organised as tidy data frames where forcing and output time series are nested inside cells and each row represents one gridcell. Because `runread_pmodel_f()` requires full forcing time series sequences and because reading full time series with global coverage is commonly limited by memory, the forcing data first has to be re-organised as tidy data frames and split into separate files for each longitudinal band. After running the simulations, outputs, written as tidy data frames into separate files for each longitudinal band, have to be collected again before further processing of spatial fields.

`src/apply_cwd_global.sh`: Bash script that calls `apply_cwd_global.R` , to be used as an alternative and containing submission statement for HPC.
Each of these steps is implemented by a separate {grsofun} function. Functions share the `settings` argument which contains a list of all user-specified and simulation-specific information. Model parameters (which may be calibrated using {rsofun}) are provided as another separate argument.

Note: This step creates data at the original temporal resolution. Data is not collected at this stage to avoid memory limitation.
The workflow of running a global simulation of one year with WATCH-WFDEI climate forcing and MODIS FPAR is demonstrated below.

3. `analysis/get_cwd_annmax.R`: Script for parallel applying function for determining annual maximum. Reading files written in previous step and writing files (in the example):

`~/data/cmip6-ng/tidy/cwd/<fileprefix>_<ilon>_ANNMAX.rds`

4. `collect_cwd_annmax.R`: Script for collecting annual time series of each gridcell - is much smaller data and can now be handled by reading all into memory. Writes file containing global data with annual resolution:

`~/data/cmip6-ng/tidy/cwd/<fileprefix>_cum_ANNMAX.rds`

5. `analysis/create_nc_annmax.R`: Script for writing the global annual data into a NetCDF file. This uses the function `write_nc2()` from the package {rgeco}. Install it from [here](https://github.com/geco-bern/rgeco). Writes file containing global data with annual resolution as NetCDF:

`~/data/cmip6-ng/tidy/cwd/evspsbl_cum_ANNMAX.nc`
## Workflow

**Note**: Adjust paths and file name prefixes for your own case in scripts (located in subdirectory `analysis/`)
Load libraries for this vignette.

``` r
library(ggplot2)
library(dplyr)
library(rnaturalearth)
library(cowplot)
```

### Specify settings

``` r
settings <- list(
fileprefix = "test", # simulation name defined by the user
model = "pmodel", # in future could also be "biomee", but not yet implemented
year_start = 2018, # xxx not yet handled
year_end = 2018, # xxx not yet handled
grid = list( # a list specifying the grid which must be common to all forcing data
lon_start = -179.75,
dlon = 0.5,
len_ilon = 720,
lat_start = -89.75,
dlat = 0.5,
len_ilat = 360
),
source_climate = "watch-wfdei", # a string specifying climate forcing dataset-specific variables
dir_climate = "~/data/watch_wfdei", # path to where climate forcing data is located
dir_climate_tidy = "~/data/watch_wfdei/tidy/", # path to where tidy climate forcing data is to be written
source_fapar = "modis", # a string specifying fAPAR forcing dataset-specific variables
file_fapar = "~/data/MODIS-C006_MOD15A2_LAI_FPAR_zmaw/MODIS-C006_MOD15A2__LAI_FPAR__LPDAAC__GLOBAL_0.5degree__UHAM-ICDC__2000_2018__MON__fv0.02.nc", # path to where fAPAR forcing data is located
dir_fapar_tidy = "~/data/MODIS-C006_MOD15A2_LAI_FPAR_zmaw/tidy/", # path to where tidy fAPAR forcing data is to be written
file_whc = "~/data/mct_data/cwdx80_forcing_halfdeg.nc", # path to where root zone storage capacity forcing data is located
dir_whc_tidy = "~/data/mct_data/tidy/", # path to where tidy root zone storage capacity forcing data is to be written
file_landmask = "~/data/archive_legacy/landmasks/WFDEI-elevation.nc", # path to where land mask data is located
dir_landmask_tidy = "~/data/archive_legacy/landmasks/WFDEI-elevation_tidy/", # path to where tidy land mask data is to be written
file_elv = "~/data/archive_legacy/landmasks/WFDEI-elevation.nc", # path to where elevation data is located
dir_elv_tidy = "~/data/archive_legacy/landmasks/WFDEI-elevation_tidy/", # path to where tidy elevation data is to be written
save_drivers = TRUE, # whether rsofun driver object is to be saved. Uses additional disk space but substantially speeds up grsofun_run().
dir_drivers = here::here("input/tidy/"), # path where rsofun drivers are to be written
overwrite = FALSE, # whether files with tidy forcing data and drivers are to be overwritten. If false, reads files if available instead of re-creating them.
spinupyears = 10, # model spin-up length
recycle = 1, # climate forcing recycling during the spinup
dir_out = here::here("output/tidy/"), # path for tidy model output
dir_out_nc = "xxx", # xxx not yet handled
save = list( # a named list where names correspond to variable names in rsofun output and the value is a string specifying the temporal resolution to which global output is to be aggregated.
gpp = "mon"
),
nthreads = 1, # distribute to multiple nodes for high performance computing - xxx not yet implemented
ncores_max = 1 # set to 1 for un-parallel run
)
```

### Specify model parameters

``` r
par <- list(
kphio = 0.04998, # setup ORG in Stocker et al. 2020 GMD
kphio_par_a = 0.0, # set to zero to disable temperature-dependence of kphio
kphio_par_b = 1.0,
soilm_thetastar = 0.6 * 240, # to recover old setup with soil moisture stress
soilm_betao = 0.0,
beta_unitcostratio = 146.0,
rd_to_vcmax = 0.014, # value from Atkin et al. 2015 for C3 herbaceous
tau_acclim = 30.0,
kc_jmax = 0.41
)
```

### Make forcing tidy

Convert forcing data, provided as NetCDF, into a tidy format (time series data frames for each gridcell along rows) and saved for longitudinal bands in a binary format.

```{r eval=FALSE}
settings <- grsofun_tidy(settings)
```

### Run model

Read forcing data and construct the driver object for an rsofun run. Run the model and save outputs in a tidy format in `.rds` files for each longitudinal band.

``` r
error <- grsofun_run(par, settings)
```

### Collect outputs

Read outputs saved in separate files for each longitudinal band. Collect data from all longitudinal bands and apply temporal aggregation steps as specified by the settings.

This can either write the aggregated data again to tidy files by longitudinal band, or return the temporally aggregated data into the R environment.

``` r
df <- grsofun_collect(settings, return_data = TRUE)
```

## Example plot

Plot a map.

``` r
coast <- rnaturalearth::ne_coastline(
scale = 110,
returnclass = "sf"
)

# January
gg1 <- df |>
filter(month == 1) |>
ggplot() +
geom_raster(
aes(x = lon, y = lat, fill = gpp),
show.legend = TRUE
) +
geom_sf(
data = coast,
colour = 'black',
linewidth = 0.3
) +
coord_sf(
ylim = c(-60, 85),
expand = FALSE
) +
scale_fill_viridis_c(
name = expression(paste("gC m"^-2, "s"^-1)),
option = "cividis",
limits = c(0, 15)
) +
theme_void() +
labs(
subtitle = "January monthly mean"
)

# July
gg2 <- df |>
filter(month == 7) |>
ggplot() +
geom_raster(
aes(x = lon, y = lat, fill = gpp),
show.legend = TRUE
) +
geom_sf(
data = coast,
colour = 'black',
linewidth = 0.3
) +
coord_sf(
ylim = c(-60, 85),
expand = FALSE
) +
scale_fill_viridis_c(
name = expression(paste("gC m"^-2, "s"^-1)),
option = "cividis",
limits = c(0, 15)
) +
theme_void() +
labs(
subtitle = "July monthly mean"
)

cowplot::plot_grid(
gg1,
gg2,
ncol = 1
)
```

![Global distribution of monthly mean GPP in January (top) and July (bottom).](fig/gpp_demo.png)
Binary file added man/figures/gpp_demo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 8eba781

Please sign in to comment.