Skip to content

Latest commit

 

History

History
executable file
·
510 lines (412 loc) · 17.4 KB

README.md

File metadata and controls

executable file
·
510 lines (412 loc) · 17.4 KB

processNC: R Package for processing and analysing (large) NetCDF files

Overview

processNC is an R package for processing and analysing NetCDF files in R. NetCDF files can easily be loaded into R using the rast() function from the terra package in R or formerly also using the raster() function from the raster package. However, when trying to handle large NetCDF files it might be more convenient to directly use the ncdf4 package or the Climate Data Operators software and this package provides a simplified wrapper for those two options.

The need for this package arised from the task to load large NetCDF files with global daily climate data to calculate monthly or yearly averages. With this package this task can be achieved in a much faster way and without having to read the entire file into memory.

NetCDF functions

For this, the package mainly consists of two functions:

  • subsetNC() subsets one or multiple NetCDF files by space (x,y), time and/or variable
  • summariseNC() summarises one or multiple NetCDF files over time

In addition there is also a function called cellstatsNC(), which calculates the spatial mean of one or multiple NetCDF files.

Raster functions

There is also a function called summariseRaster, which allows a similar implementation to the summariseNC function, but using any type of raster files rather than NetCDF files. And there is also an identical function summariseRast, which depends on the slightly faster terra package. See benchmark example further down below for a comparison of computation time between the three functions.

CDO functions

There are also several functions (filterNC, mergeNC and aggregateNC), which rely on the Climate Data Operators (CDO) software (https://code.mpimet.mpg.de/projects/cdo).

Note: In order to use those functions, you need to have the CDO software installed on your computer.

CDO is much faster than the equivalent R-functions, thus the CDO-based functions are considerably faster than the subsetNC() and summariseNC() functions.

Installation

To use the package, it can be installed directly from GitHub using the remotes package.

# If not yet installed, install the remotes package
if(!"remotes" %in% installed.packages()[,"Package"]) install.packages("remotes")

# Download & Install the package from GitHub if not yet done so
if(!"processNC" %in% installed.packages()[,"Package"]) remotes::install_github("RS-eco/processNC", build_vignettes=T)

If you encounter a bug or if you have any problems, please file an issue on Github.

Usage

Load processNC & terra package

library(processNC)
library(terra)
library(raster)
library(dplyr)

List NetCDF data files

# List daily temperature files for Germany from 1979 till 2016
tas_files <- list.files(paste0(system.file(package="processNC"), "/extdata"), 
                        pattern="tas.*\\.nc", full.names=T)

# Show files
basename(tas_files)
[1] "tas_ewembi_deu_1979_1980.nc" "tas_ewembi_deu_1981_1990.nc"
[3] "tas_ewembi_deu_1991_2000.nc" "tas_ewembi_deu_2001_2010.nc"
[5] "tas_ewembi_deu_2011_2016.nc"
# List daily precipitation files for Germany from 1979 till 2016
pr_files <- list.files(paste0(system.file(package="processNC"), "/extdata"), 
                       pattern="pr.*\\.nc", full.names=T)

Subset NetCDF file

# Subset NetCDF files by time and rough extent of Bavaria
subsetNC(files=tas_files, ext=c(8.5, 14, 47, 51), 
         startdate=1990, enddate=1999, varid="tas")
class       : SpatRaster 
dimensions  : 10, 6, 3288  (nrow, ncol, nlyr)
resolution  : 0.9166667, 0.4  (x, y)
extent      : 8.5, 14, 47, 51  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs 
source(s)   : memory
names       : 1990-01-01, 1990-01-02, 1990-01-03, 1990-01-04, 1990-01-05, 1990-01-06, ... 
min values  :   268.2550,   269.4496,   269.9260,   268.7616,   269.0258,   269.5515, ... 
max values  :   272.5554,   272.0455,   272.0891,   272.1550,   273.2923,   274.2967, ... 
time (days) : 1990-01-01 to 1999-01-01 
# Get SpatialPolygonsDataFrame of Bavaria
data(bavaria)

# Subset NetCDF file by SpatVector
r <- subsetNC(tas_files, ext=terra::vect(bavaria), varid="tas")
plot(r[[1]])
plot(bavaria, add=T)

# Subset NetCDF file just by time
subsetNC(tas_files, startdate=1990, enddate=1999)
class       : SpatRaster 
dimensions  : 17, 14, 3288  (nrow, ncol, nlyr)
resolution  : 0.6071429, 0.4117647  (x, y)
extent      : 6.25, 14.75, 47.75, 54.75  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs 
source(s)   : memory
names       : 1990-01-01, 1990-01-02, 1990-01-03, 1990-01-04, 1990-01-05, 1990-01-06, ... 
min values  :    268.255,    268.876,   269.7884,   268.2937,   267.8489,   268.2939, ... 
max values  :    274.574,    274.632,   274.9313,   273.3459,   276.0111,   276.3013, ... 
time (days) : 1990-01-01 to 1999-01-01 

Summarise NetCDF file

# Summarise daily NetCDF file for 10 years by week
summariseNC(files=tas_files[4], startdate=2001, enddate=2010, 
            group_col=c("week"))
class       : SpatRaster 
dimensions  : 15, 18, 530  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : 6, 15, 47.5, 55  (xmin, xmax, ymin, ymax)
coord. ref. :  
source(s)   : memory
names       :   1 2001,   1 2002,   1 2003,   1 2004,   1 2005,   1 2006, ... 
min values  : 272.2954, 264.6082, 268.2481, 266.8755, 273.0641, 268.0756, ... 
max values  : 280.0579, 273.1299, 276.3789, 274.6424, 280.0099, 274.7085, ... 
# Summarise daily NetCDF file for 10 years by month
summariseNC(files=tas_files[4], startdate=2001, enddate=2010, group_col=c("month"))
class       : SpatRaster 
dimensions  : 15, 18, 120  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : 6, 15, 47.5, 55  (xmin, xmax, ymin, ymax)
coord. ref. :  
source(s)   : memory
names       : Apr 2001, Apr 2002, Apr 2003, Apr 2004, Apr 2005, Apr 2006, ... 
min values  : 276.3501, 277.1500,   277.65, 278.5500,   277.15, 278.3500, ... 
max values  : 282.3499, 283.8501,   284.35, 284.7501,   284.65, 283.6499, ... 
# Summarise daily NetCDF file for 10 years by year
summariseNC(files=tas_files[4], startdate=2001, enddate=2010, 
            group_col=c("year"))
class       : SpatRaster 
dimensions  : 15, 18, 10  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : 6, 15, 47.5, 55  (xmin, xmax, ymin, ymax)
coord. ref. :  
source(s)   : memory
names       :     2001,     2002,     2003,     2004,     2005,     2006, ... 
min values  : 279.1473, 279.6632, 279.0957, 278.6661, 278.1127, 279.0922, ... 
max values  : 284.4210, 284.8579, 284.7202, 284.2071, 284.2196, 284.7766, ... 
# Summarise daily NetCDF file for 10 years first by week and then by year
summariseNC(files=tas_files[4], startdate=2001, enddate=2010, 
            group_col=c("week", "year"))
class       : SpatRaster 
dimensions  : 15, 18, 53  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : 6, 15, 47.5, 55  (xmin, xmax, ymin, ymax)
coord. ref. :  
source(s)   : memory
names       :        1,        2,        3,        4,        5,        6, ... 
min values  : 269.4892, 269.5704, 270.5732, 269.6285, 270.4546, 271.3650, ... 
max values  : 275.3027, 275.9724, 277.5657, 275.8852, 276.7980, 277.7568, ... 
# Summarise daily NetCDF file for 10 years first by month and then by year
s <- summariseNC(files=tas_files[4], startdate=2001, enddate=2010,
                 group_col=c("month", "year"))
s
class       : SpatRaster 
dimensions  : 15, 18, 12  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : 6, 15, 47.5, 55  (xmin, xmax, ymin, ymax)
coord. ref. :  
source(s)   : memory
names       : January, February,  March,  April,    May,   June, ... 
min values  :  269.81,   271.11, 274.13, 278.36, 283.22, 286.39, ... 
max values  :  276.14,   277.16, 280.17, 284.59, 288.88, 292.26, ... 
plot(s[[1]])

# Summarise daily NetCDF files for all years
yearly_tas <- summariseNC(tas_files, startdate=2000, enddate=2016, group_col="year")
yearly_tas
class       : SpatRaster 
dimensions  : 15, 18, 17  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : 6, 15, 47.5, 55  (xmin, xmax, ymin, ymax)
coord. ref. :  
source(s)   : memory
names       :     2000,     2001,     2002,     2003,     2004,     2005, ... 
min values  : 279.9609, 279.1473, 279.6632, 279.0957, 278.6661, 278.1127, ... 
max values  : 285.1134, 284.4210, 284.8579, 284.7202, 284.2071, 284.2196, ... 
yearly_pr <- summariseNC(pr_files, startdate=2000, enddate=2016, group_col="year")
yearly_pr
class       : SpatRaster 
dimensions  : 15, 18, 17  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : 6, 15, 47.5, 55  (xmin, xmax, ymin, ymax)
coord. ref. :  
source(s)   : memory
names       :       2000,       2001,       2002,       2003,       2004,       2005, ... 
min values  : 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, ... 
max values  : 0.02125388, 0.02209105, 0.02318245, 0.01517012, 0.01880114, 0.02133196, ... 
plot(yearly_tas[[1]])

# Calculate mean annual temperature for Germany
yearmean_tas <- as.data.frame(terra::global(yearly_tas, fun="mean", na.rm=T))
colnames(yearmean_tas) <- "mean"
yearmean_tas <- tibble::rownames_to_column(yearmean_tas, var="year")
yearmean_tas$year <- sub("X", "", yearmean_tas$year)
yearmean_tas$mean <- yearmean_tas$mean - 273.15
head(yearmean_tas)
  year      mean
1 2000 10.213450
2 2001  9.327813
3 2002  9.823712
4 2003  9.659545
5 2004  9.237923
6 2005  9.365806
# Calculate mean total precipitation for Germany
yearmean_pr <- as.data.frame(terra::global(yearly_pr, fun="mean", na.rm=T))
colnames(yearmean_pr) <- "mean"
yearmean_pr <- tibble::rownames_to_column(yearmean_pr, var="year")
yearmean_pr$year <- sub("X", "", yearmean_pr$year)
yearmean_pr$mean <- yearmean_pr$mean #- 273.15
head(yearmean_pr)
  year        mean
1 2000 0.007102034
2 2001 0.007969856
3 2002 0.008777291
4 2003 0.005238077
5 2004 0.006979942
6 2005 0.006719711

Summarise NetCDF file using CDO commands

  • Filter years
temp <- tempfile(fileext=".nc")
filterNC(file=tas_files[2], startdate=1985, enddate=1990, outfile=temp)
Created file /tmp/RtmpNRiCrJ/file3c1938d617a8.nc.
terra::rast(temp)
class       : SpatRaster 
dimensions  : 15, 18, 2191  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : 6, 15, 47.5, 55  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 
source      : file3c1938d617a8.nc 
varname     : tas 
names       :  tas_1,  tas_2,  tas_3,  tas_4,  tas_5,  tas_6, ... 
unit        : Kelvin, Kelvin, Kelvin, Kelvin, Kelvin, Kelvin, ... 
time (days) : 1985-01-01 to 1990-12-31 
  • Merge files:
temp <- tempfile(fileext=".nc")
mergeNC(files=tas_files, outfile=temp)
Created file /tmp/RtmpNRiCrJ/file3c19719fa51c.nc.
terra::rast(temp)
class       : SpatRaster 
dimensions  : 15, 18, 13880  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : 6, 15, 47.5, 55  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 
source      : file3c19719fa51c.nc 
varname     : tas 
names       :  tas_1,  tas_2,  tas_3,  tas_4,  tas_5,  tas_6, ... 
unit        : Kelvin, Kelvin, Kelvin, Kelvin, Kelvin, Kelvin, ... 
time (days) : 1979-01-02 to 2017-01-01 
  • Aggregate files:
temp2 <- tempfile(fileext=".nc")
aggregateNC(infile=temp, outfile=temp2, group_col="month", var="tas", startdate="2000", enddate="2009")
Created file /tmp/RtmpNRiCrJ/file3c197c5eb3ff.nc.
temp2 <- terra::rast(temp2)
temp2
class       : SpatRaster 
dimensions  : 15, 18, 10  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : 6, 15, 47.5, 55  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 
source      : file3c197c5eb3ff.nc:tas 
varname     : tas 
names       :  tas_1,  tas_2,  tas_3,  tas_4,  tas_5,  tas_6, ... 
unit        : Kelvin, Kelvin, Kelvin, Kelvin, Kelvin, Kelvin, ... 
time (days) : 2000-07-01 to 2009-07-02 
names(temp2) <- 2000:2009
time(temp2) <- lubridate::year(time(temp2))
plot(temp2)

Summarise Raster file

This can be achieved using:

summariseRaster(tas_files[4], startdate=2001, enddate=2010, var="tas")
class      : RasterBrick 
dimensions : 15, 18, 270, 12  (nrow, ncol, ncell, nlayers)
resolution : 0.5, 0.5  (x, y)
extent     : 6, 15, 47.5, 55  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : memory
names      :  January, February,    March,    April,      May,     June,     July,   August, September,  October, November, December 
min values : 269.8396, 271.0432, 273.9367, 278.2057, 283.1849, 286.2067, 288.3307, 287.7344,  283.7335, 280.3433, 275.1130, 270.7514 
max values : 276.1422, 277.0875, 280.0108, 284.4274, 288.8376, 292.0917, 294.0858, 293.1610,  289.0893, 284.8109, 280.7387, 276.4388 
months     : 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 

or:

summariseRast(tas_files[4], startdate=2001, enddate=2010, var="tas")
class       : SpatRaster 
dimensions  : 15, 18, 12  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : 6, 15, 47.5, 55  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 
source(s)   : memory
names       :  January, February,    March,    April,      May,     June, ... 
min values  : 269.8396, 271.0432, 273.9367, 278.2057, 283.1849, 286.2067, ... 
max values  : 276.1422, 277.0875, 280.0108, 284.4274, 288.8376, 292.0917, ... 
time (raw)  : 1 to 12 

Note: This should give the same output as the summariseNC() function!!!

Comparing computation time

library(rbenchmark)
benchmark("summariseNC" = {summariseNC(tas_files[4], startdate=2001, enddate=2010, 
                                       group_col=c("year", "month"))},
          "summariseRast" = {summariseRast(tas_files[4], startdate=2001, enddate=2010, var="tas")},
          "summariseRaster" = {summariseRaster(tas_files[4], startdate=2001, enddate=2010, var="tas")}, 
          replications=1, columns = c("test", "elapsed", "replications", "relative", "user.self", "sys.self"))
             test elapsed replications relative user.self sys.self
1     summariseNC  29.054            1    6.749     0.551    0.593
2   summariseRast  59.204            1   13.752    59.175    0.012
3 summariseRaster   4.305            1    1.000     4.207    0.092

Comparing results

sumNC <- summariseNC(tas_files[4], startdate=2001, enddate=2010, group_col=c("year", "month"))
sumNC <- summary(sumNC[[1]])
sumRast <- summariseRast(tas_files[4], startdate=2001, enddate=2010, var="tas")
sumRast <- summary(sumRast[[1]])
sumRaster <- summariseRaster(tas_files[4], startdate=2001, enddate=2010, var="tas")
sumRaster <- summary(terra::rast(sumRaster[[1]]))

sum_df <- data.frame(sumNC=sumNC,sumRast=sumRast,sumRaster=sumRaster) %>%
  select(sumNC.Freq, sumRast.Freq, sumRaster.Freq)
colnames(sum_df) <- c("sumNC", "sumRast", "sumRaster")
sum_df %>% knitr::kable()
sumNC sumRast sumRaster
Min. :269.8 Min. :269.8 Min. :269.8
1st Qu.:272.7 1st Qu.:272.7 1st Qu.:272.7
Median :273.6 Median :273.5 Median :273.5
Mean :273.6 Mean :273.6 Mean :273.6
3rd Qu.:274.5 3rd Qu.:274.5 3rd Qu.:274.5
Max. :276.1 Max. :276.1 Max. :276.1
NA’s :81 NA’s :81 NA’s :81

CellStats NetCDF file

# Summarise daily NetCDF file for 16 years and show first 6 values
head(cellstatsNC(tas_files, startdate=2000, enddate=2016))
      mean       date
1 273.7878 2000-01-01
2 274.9809 2000-01-02
3 275.6813 2000-01-03
4 276.3606 2000-01-04
5 277.3515 2000-01-05
6 277.0308 2000-01-06
# Summarise daily NetCDF files without time limit
mean_daily_temp <- cellstatsNC(tas_files, stat="mean")
mean_daily_prec <- cellstatsNC(pr_files, stat="mean")

# Summarise yearly mean temperature of Germany 
mean_daily_temp$year <- lubridate::year(mean_daily_temp$date)
mean_daily_temp$mean <- mean_daily_temp$mean - 273.15
mean_annual_temp <- aggregate(mean ~ year, mean_daily_temp, mean)

# Summarise yearly total precipitation of Germany 
mean_daily_prec$year <- lubridate::year(mean_daily_prec$date)
mean_daily_prec$mean <- mean_daily_prec$mean #- 273.15
mean_annual_prec <- aggregate(mean ~ year, mean_daily_prec, sum)

library(ggplot2); library(patchwork)
p1 <- ggplot() + geom_line(data=mean_annual_temp, aes(x=year, y=mean)) +
  theme_bw() + labs(x="Year", y="Annual mean temperature (°C)") + ggtitle("Germany")
p2 <- ggplot() + geom_line(data=mean_annual_prec, aes(x=year, y=mean)) +
  theme_bw() + labs(x="Year", y="Annual total precipitation")
p1 / p2