created nice readme

geco-bern · Jun 11, 2024 · 8eba781 · 8eba781
1 parent c42d98c
commit 8eba781
Show file tree

Hide file tree

Showing 7 changed files with 238 additions and 74 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -22,5 +22,10 @@ Imports:
     parallel,
     map2tidy,
     rgeco
+Suggests:
+    here,
+    cowplot,
+    rnaturalearth,
+    ggplot2
 Roxygen: list(markdown = TRUE)
 RoxygenNote: 7.3.1
diff --git a/R/grsofun.R b/R/grsofun.R
@@ -12,7 +12,7 @@ grsofun <- function(par, settings){
   # Only variables and at aggregation level required for NetCDF output
   df <- grsofun_collect(settings, return_data = TRUE)
 
-  # Write to NetCDF files
+  # Write to NetCDF files - XXX not yet implemented
   error <- grsofun_save_nc(df, settings)
 
 }
diff --git a/R/grsofun_collect.R b/R/grsofun_collect.R
@@ -105,7 +105,7 @@ grsofun_collect_byilon <- function(
   if (nrow(ddf) > 0){
     # aggregate temporally to mean
     # to monthly
-    vars <- names(settings$save_nc[settings$save_nc == "mon"])
+    vars <- names(settings$save[settings$save == "mon"])
     mdf <- ddf |>
       tidyr::unnest(data) |>
       dplyr::mutate(

diff --git a/R/grsofun_run.R b/R/grsofun_run.R
@@ -4,8 +4,8 @@ grsofun_run <- function(par, settings){
 
     if (settings$ncores_max == 1){
       # Do not parallelize
-      # out <- tibble(ilon = 292) |>
-      out <- tibble(ilon = seq(settings$grid$len_ilon)) |>
+      # out <- dplyr::tibble(ilon = 292) |>
+      out <- dplyr::tibble(ilon = seq(settings$grid$len_ilon)) |>
           dplyr::mutate(out = purrr::map(
           ilon,
           ~grsofun_run_byilon(
@@ -44,7 +44,7 @@ grsofun_run <- function(par, settings){
 
       # distribute computation across the cores, calculating for all longitudinal
       # indices of this chunk
-      out <- tibble(ilon = seq(settings$grid$len_ilon)) |>
+      out <- dplyr::tibble(ilon = seq(settings$grid$len_ilon)) |>
         multidplyr::partition(cl) |>
         dplyr::mutate(out = purrr::map(
           ilon,
@@ -108,7 +108,7 @@ grsofun_run_bychunk <- function(chunk, nthreads, par, settings){
 
   # distribute computation across the cores, calculating for all longitudinal
   # indices of this chunk
-  out <- tibble(ilon = vec_index) |>
+  out <- dplyr::tibble(ilon = vec_index) |>
     multidplyr::partition(cl) |>
     dplyr::mutate(out = purrr::map(
       ilon,
@@ -269,7 +269,7 @@ grsofun_run_byilon <- function(ilon, par, settings){
         year_start <- min(lubridate::year(dates))
         year_end <- max(lubridate::year(dates))
 
-        ddf <- tibble(
+          ddf <- dplyr::tibble(
           date = seq(
             from = lubridate::ymd(paste0(year_start, "-01-01")),
             to = lubridate::ymd(paste0(year_end, "-12-31")),

diff --git a/README.md b/README.md
@@ -1,34 +1,180 @@
-# Apply CWD algorithm globally
+# grsofun: Global rsofun runs
 
-*Author: Benjamin Stocker*
+## Approach
 
-## Workflow
-
-### Prepare forcing
-
-`analysis/make_tidy_<DATAPRODUCT>.R`
-
-{rsofun} requires model forcing data as *tidy* data frames. The first step is to make climate and EO forcing data, commonly provided as NetCDF, tidy. That is, data is saved as nested data frames, where each row is one grid cell and time series is nested in the column `data`. 
-
-
-2.  `analysis/apply_cwd_global.R`: Script for parallel applying the CWD algorithm (or anything else that operates on full time series) separately for on gridcell. Distributes by longitudinal band. Reading files written in previous step and writing files (in the example):
+The {grsofun} package provides functions that wrap a call to `runread_pmodel_f()` from the {rsofun} package. This call is a point-scale simulation of the P-model. Functions of {grsofun} enable () simulations on each gridcell of a spatial grid and a parallelisation by gridcells for high computational speed.
 
-    `~/data/cmip6-ng/tidy/cwd/<fileprefix>_<ilon>.rds`
+The implementation follows a *tidy* data paradigm. All model input and output are organised as tidy data frames where forcing and output time series are nested inside cells and each row represents one gridcell. Because `runread_pmodel_f()` requires full forcing time series sequences and because reading full time series with global coverage is commonly limited by memory, the forcing data first has to be re-organised as tidy data frames and split into separate files for each longitudinal band. After running the simulations, outputs, written as tidy data frames into separate files for each longitudinal band, have to be collected again before further processing of spatial fields.
 
-    `src/apply_cwd_global.sh`: Bash script that calls `apply_cwd_global.R` , to be used as an alternative and containing submission statement for HPC.
+Each of these steps is implemented by a separate {grsofun} function. Functions share the `settings` argument which contains a list of all user-specified and simulation-specific information. Model parameters (which may be calibrated using {rsofun}) are provided as another separate argument.
 
-    Note: This step creates data at the original temporal resolution. Data is not collected at this stage to avoid memory limitation.
+The workflow of running a global simulation of one year with WATCH-WFDEI climate forcing and MODIS FPAR is demonstrated below.
 
-3.  `analysis/get_cwd_annmax.R`: Script for parallel applying function for determining annual maximum. Reading files written in previous step and writing files (in the example):
-
-    `~/data/cmip6-ng/tidy/cwd/<fileprefix>_<ilon>_ANNMAX.rds`
-
-4.  `collect_cwd_annmax.R`: Script for collecting annual time series of each gridcell - is much smaller data and can now be handled by reading all into memory. Writes file containing global data with annual resolution:
-
-    `~/data/cmip6-ng/tidy/cwd/<fileprefix>_cum_ANNMAX.rds`
-
-5.  `analysis/create_nc_annmax.R`: Script for writing the global annual data into a NetCDF file. This uses the function `write_nc2()` from the package {rgeco}. Install it from [here](https://github.com/geco-bern/rgeco). Writes file containing global data with annual resolution as NetCDF:
-
-    `~/data/cmip6-ng/tidy/cwd/evspsbl_cum_ANNMAX.nc`
+## Workflow
 
-**Note**: Adjust paths and file name prefixes for your own case in scripts (located in subdirectory `analysis/`)
+Load libraries for this vignette.
+
+``` r
+library(ggplot2)
+library(dplyr)
+library(rnaturalearth)
+library(cowplot)
+```
+
+### Specify settings
+
+``` r
+settings <- list(
+  fileprefix = "test",  # simulation name defined by the user
+  model = "pmodel",     # in future could also be "biomee", but not yet implemented
+  year_start = 2018,    # xxx not yet handled
+  year_end = 2018,      # xxx not yet handled
+  grid = list(          # a list specifying the grid which must be common to all forcing data
+    lon_start = -179.75,
+    dlon = 0.5,
+    len_ilon = 720,
+    lat_start = -89.75,
+    dlat = 0.5,
+    len_ilat = 360
+    ),
+  source_climate = "watch-wfdei",  # a string specifying climate forcing dataset-specific variables
+  dir_climate = "~/data/watch_wfdei", # path to where climate forcing data is located
+  dir_climate_tidy = "~/data/watch_wfdei/tidy/",  # path to where tidy climate forcing data is to be written
+  source_fapar = "modis",   # a string specifying fAPAR forcing dataset-specific variables
+  file_fapar = "~/data/MODIS-C006_MOD15A2_LAI_FPAR_zmaw/MODIS-C006_MOD15A2__LAI_FPAR__LPDAAC__GLOBAL_0.5degree__UHAM-ICDC__2000_2018__MON__fv0.02.nc",  # path to where fAPAR forcing data is located
+  dir_fapar_tidy = "~/data/MODIS-C006_MOD15A2_LAI_FPAR_zmaw/tidy/",  # path to where tidy fAPAR forcing data is to be written
+  file_whc = "~/data/mct_data/cwdx80_forcing_halfdeg.nc",  # path to where root zone storage capacity forcing data is located
+  dir_whc_tidy = "~/data/mct_data/tidy/",  # path to where tidy  root zone storage capacity forcing data is to be written
+  file_landmask = "~/data/archive_legacy/landmasks/WFDEI-elevation.nc",    # path to where land mask data is located
+  dir_landmask_tidy = "~/data/archive_legacy/landmasks/WFDEI-elevation_tidy/",    # path to where tidy land mask data is to be written
+  file_elv = "~/data/archive_legacy/landmasks/WFDEI-elevation.nc",   # path to where elevation data is located
+  dir_elv_tidy = "~/data/archive_legacy/landmasks/WFDEI-elevation_tidy/",   # path to where tidy elevation data is to be written
+  save_drivers = TRUE,   # whether rsofun driver object is to be saved. Uses additional disk space but substantially speeds up grsofun_run().
+  dir_drivers = here::here("input/tidy/"),  # path where rsofun drivers are to be written
+  overwrite = FALSE,    # whether files with tidy forcing data and drivers are to be overwritten. If false, reads files if available instead of re-creating them.
+  spinupyears = 10,     # model spin-up length
+  recycle = 1,          # climate forcing recycling during the spinup
+  dir_out = here::here("output/tidy/"),     # path for tidy model output
+  dir_out_nc = "xxx",  # xxx not yet handled
+  save = list(         # a named list where names correspond to variable names in rsofun output and the value is a string specifying the temporal resolution to which global output is to be aggregated. 
+    gpp = "mon"
+  ),
+  nthreads = 1,   # distribute to multiple nodes for high performance computing - xxx not yet implemented
+  ncores_max = 1  # set to 1 for un-parallel run
+)
+```
+
+### Specify model parameters
+
+``` r
+par <- list(
+  kphio              = 0.04998,    # setup ORG in Stocker et al. 2020 GMD
+  kphio_par_a        = 0.0,        # set to zero to disable temperature-dependence of kphio
+  kphio_par_b        = 1.0,
+  soilm_thetastar    = 0.6 * 240,  # to recover old setup with soil moisture stress
+  soilm_betao        = 0.0,
+  beta_unitcostratio = 146.0,
+  rd_to_vcmax        = 0.014,      # value from Atkin et al. 2015 for C3 herbaceous
+  tau_acclim         = 30.0,
+  kc_jmax            = 0.41
+)
+```
+
+### Make forcing tidy
+
+Convert forcing data, provided as NetCDF, into a tidy format (time series data frames for each gridcell along rows) and saved for longitudinal bands in a binary format.
+
+```{r eval=FALSE}
+settings <- grsofun_tidy(settings)
+```
+
+### Run model
+
+Read forcing data and construct the driver object for an rsofun run. Run the model and save outputs in a tidy format in `.rds` files for each longitudinal band.
+
+``` r
+error <- grsofun_run(par, settings)
+```
+
+### Collect outputs
+
+Read outputs saved in separate files for each longitudinal band. Collect data from all longitudinal bands and apply temporal aggregation steps as specified by the settings.
+
+This can either write the aggregated data again to tidy files by longitudinal band, or return the temporally aggregated data into the R environment.
+
+``` r
+df <- grsofun_collect(settings, return_data = TRUE)
+```
+
+## Example plot
+
+Plot a map.
+
+``` r
+coast <- rnaturalearth::ne_coastline(
+  scale = 110, 
+  returnclass = "sf"
+  )
+
+# January
+gg1 <- df |> 
+  filter(month == 1) |> 
+  ggplot() +
+  geom_raster(
+    aes(x = lon, y = lat, fill = gpp),
+    show.legend = TRUE
+    ) +
+  geom_sf(
+    data = coast,
+    colour = 'black',
+    linewidth = 0.3
+  )  +
+  coord_sf(
+    ylim = c(-60, 85),
+    expand = FALSE
+  ) +
+  scale_fill_viridis_c(
+    name = expression(paste("gC m"^-2, "s"^-1)),
+    option = "cividis",
+    limits = c(0, 15)
+    ) +
+  theme_void() +
+  labs(
+    subtitle = "January monthly mean"
+  )
+
+# July
+gg2 <- df |> 
+  filter(month == 7) |> 
+  ggplot() +
+  geom_raster(
+    aes(x = lon, y = lat, fill = gpp),
+    show.legend = TRUE
+    ) +
+  geom_sf(
+    data = coast,
+    colour = 'black',
+    linewidth = 0.3
+  )  +
+  coord_sf(
+    ylim = c(-60, 85),
+    expand = FALSE
+  ) +
+  scale_fill_viridis_c(
+    name = expression(paste("gC m"^-2, "s"^-1)),
+    option = "cividis",
+    limits = c(0, 15)
+    ) +
+  theme_void() +
+  labs(
+    subtitle = "July monthly mean"
+  )
+
+cowplot::plot_grid(
+  gg1, 
+  gg2,
+  ncol = 1
+)
+```
+
+![Global distribution of monthly mean GPP in January (top) and July (bottom).](fig/gpp_demo.png)
diff --git a/man/figures/gpp_demo.png b/man/figures/gpp_demo.png