README.Rmd

---
output:
  github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r setup, include = FALSE}
library(epluspar)
library(knitr)

# the default output hook
hook_output = knitr::knit_hooks$get('output')
knitr::knit_hooks$set(output = function(x, options) {
  if (!is.null(n <- options$out.lines)) {
    x <- unlist(strsplit(x, '\n', fixed = TRUE))
    if (length(x) > n) {
      # truncate the output
      x <- c(head(x, n), '....', '')
    } else {
      x <- c(x, "")
    }
    x <- paste(x, collapse = '\n') # paste first n lines together
  }
  hook_output(x, options)
})

knitr::opts_knit$set(root.dir = tempdir())
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-",
  out.lines = 20
)

# Make sure the date is shown in English format not Chinese.
invisible(Sys.setlocale(category = "LC_TIME", locale = "en_US.UTF-8"))
```

# epluspar
Conduct sensitivity analysis and Bayesian calibration of EnergyPlus models.

[![Travis-CI Build Status](https://api.travis-ci.com/ideas-lab-nus/epluspar.svg?token=1LqeFok1d6q5niBF8Hqr&branch=master)](https://travis-ci.com/ideas-lab-nus/epluspar)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/epluspar)](https://cran.r-project.org/package=epluspar)

## Installation

Currently, epluspar is not on CRAN. You can install the development version from
GitHub.

```{r gh-installation, eval = FALSE}
install.packages("epluspar", repos = "https://hongyuanjia.r-universe.dev")
```

<!-- TOC GFM -->

* [Sensitivity Analysis](#sensitivity-analysis)
    * [Basic workflow](#basic-workflow)
    * [Examples](#examples)
* [Bayesian Calibration](#bayesian-calibration)
    * [Basic workflow](#basic-workflow-1)
    * [Examples](#examples-1)
        * [Get RDD and MDD](#get-rdd-and-mdd)
        * [Setting Input and Output Variables](#setting-input-and-output-variables)
        * [Adding Parameters to Calibrate](#adding-parameters-to-calibrate)
        * [Getting Sample Values and Parametric Models](#getting-sample-values-and-parametric-models)
        * [Run simulations and gather data](#run-simulations-and-gather-data)
        * [Specify Measured Data](#specify-measured-data)
        * [Specify Input Data for Stan](#specify-input-data-for-stan)
        * [Get Stan file](#get-stan-file)
        * [Run Bayesian Calibration Using Stan](#run-bayesian-calibration-using-stan)

<!-- /TOC -->

## Sensitivity Analysis

### Basic workflow

The basic workflow is basically:

1. Adding parameters for sensitivity analysis using `$param()` or
   `$apply_measure()`.
1. Check parameter sampled values and generated parametric models using
   `$samples()` and `$models()` respectively.
1. Run EnergyPlus simulations in parallel using `$run()`.
1. Gather EnergyPlus simulated data using `$report_data()` or `$tabular_data()`.
1. Evaluate parameter sensitivity using `$evaluate()`.

### Examples

Create a `SensitivityJob` object:

```{r}
# use an example file from EnergyPlus v8.8 for demonstration here
path_idf <- file.path(eplusr::eplus_config(8.8)$dir, "ExampleFiles", "5Zone_Transformer.idf")
path_epw <- file.path(eplusr::eplus_config(8.8)$dir, "WeatherData", "USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw")

# create a `SensitivityJob` class which inheris from eplusr::ParametricJob class
sen <- sensi_job(path_idf, path_epw)
```

Set sensitivity parameters using `$param()` or `$apply_measure()`.

* Using `$param()`

```{r, eval = FALSE}
# set parameter using similar syntax to `Idf$set()` in eplusr
sen$param(
    # For adding a single object field as parameter
    # Syntax: Object = list(Field = c(Min, Max, Levels))
    `Supply Fan 1` = list(Fan_Total_Efficiency = c(0.1, 1.0, 5)),

    # For adding a class field as parameter
    Material := list(
        Thickness = c(min = 0.01, max = 0.08, levels = 5),
        Conductivity = c(min = 0.01, max = 0.6, levels = 6)
    ),

    # use `.names` to give names to each parameter
    .names = c("efficiency", "thickness", "conductivity"),

    # See `r` and `grid_jump` in `sensitivity::morris`
    .r = 8, .grid_jump = 1
)
```

* Using `$apply_measure()`

```{r}
# first define a "measure"
my_actions <- function (idf, efficiency, thickness, conducitivy) {
    idf$set(
        `Supply Fan 1` = list(Fan_Total_Efficiency = efficiency),
        Material := list(Thickness = thickness, Conductivity = conducitivy)
    )

    idf
}

# then apply that measure with parameter space definitions as function arguments
sen$apply_measure(my_actions,
    efficiency = c(0.1, 1.0, 5),
    thickness = c(0.01, 0.08, 5),
    conducitivy = c(0.1, 0.6, 6),
    .r = 8, .grid_jump = 1
)
```

Get samples

```{r}
sen$samples()
```

Run simulations and calculate statistic indicators

```{r}
# run simulations in temporary directory
sen$run(dir = tempdir(), echo = FALSE)

# extract output
# here is just am example
eng <- sen$tabular_data(table_name = "site and source energy",
    column_name = "energy per total building area",
    row_name = "total site energy")[, as.numeric(value)]

# calculate sensitivity
(result <- sen$evaluate(eng))

# extract data
attr(result, "data")
```

Plot

```{r get-started, fig.path = "man/figures/"}
# plot
plot(result)
```

## Bayesian Calibration

### Basic workflow

1. Setting input and output variables using `$input()` and `$output()`
   respectively.
   Input variables should be variables listed in RDD while output variables
   should be variables listed in RDD and MDD.
1. Adding parameters to calibrate using `$param()` or `$apply_measure()`.
1. Check parameter sampled values and generated parametric models using
   `$samples()` and `$models()` respectively.
1. Run EnergyPlus simulations in parallel using `$eplus_run()`.
1. Gather simulated data of input and output parameters using `$data_sim()`.
1. Specify field measured data of input and output parameters using
   `$data_field()`.
1.  Specify field measured data of input and output parameters using `$data_field()`.
1. Specify input data for Stan for Bayesian calibration using `$data_bc()`.
1. Run bayesian calibration and get predictions using stan using `$stan_run()`.

### Examples

Create a `BayesCalibJob` object:

```{r}
# use an example file from EnergyPlus v8.8 for demonstration here
path_idf <- file.path(eplusr::eplus_config(8.8)$dir, "ExampleFiles", "RefBldgLargeOfficeNew2004_Chicago.idf")
path_epw <- file.path(eplusr::eplus_config(8.8)$dir, "WeatherData", "USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw")

# create a `SensitivityJob` class which inherits from eplusr::ParametricJob class
bc <- bayes_job(path_idf, path_epw)
```

#### Get RDD and MDD

`$read_rdd()` and `$read_mdd()` can be used to get RDD and MDD for current seed
model.

```{r}
(rdd <- bc$read_rdd())
(mdd <- bc$read_mdd())
```

#### Setting Input and Output Variables

Input variables and output variables can be set by using `$input()` and
`$output()`, respectively. For `$input()`, only variables listed in RDD are
supported. For `$output()`, variables listed in RDD and MDD are both supported.

By default, they are all empty and `$input()`, `$output()` will return `NULL`.

```{r}
bc$input()
bc$output()
bc$models()
```

You can specify input and output parameters using `RddFile`, `MddFile` and
data.frames.

```{r}
# using RDD and MDD
bc$input(rdd[1:3])
bc$output(mdd[1])

# using data.frame
bc$input(eplusr::rdd_to_load(rdd[1:3]))
bc$output(eplusr::mdd_to_load(mdd[1]))
```

You can set `append` to `NULL` to remove all existing input and output
parameters.

```{r}
bc$input(append = NULL)
bc$output(append = NULL)
```

You can also directly specify variable names:

```{r}
bc$input("CoolSys1 Chiller 1", paste("chiller evaporator", c("inlet temperature", "outlet temperature", "mass flow rate")), "hourly")
bc$output("CoolSys1 Chiller 1", "chiller electric power", "hourly")
```

Note that variable cannot be set as both an input and output variable.

```{r, error = TRUE}
bc$output("CoolSys1 Chiller 1", name = "chiller evaporator inlet temperature", reporting_frequency = "hourly")
```

Also, note that input and output variables should have the same reporting
frequency.

```{r, error = TRUE}
bc$output(mdd[1], reporting_frequency = "daily")
```

For `$output()`, both variables in RDD and MDD are supported. However, for
`$input()`, only variables in RDD are allowed.

#### Adding Parameters to Calibrate

Similarly like `SensitivityJob`, parameters can be added using either `$param()`
or `$apply_measure()`.

Here use `$param()` for demonstration. Basically there are 3 format of defining
a parameter:

* `object = list(field1 = c(min, max), field2 = c(min, max), ...)`

  This is the basic format. `field1` and `field2` in `object` will be added as
  two different parameters, with minimum and maximum value specified as `min`
  and `max`.

* `class := list(field1 = c(min, max), field2 = c(min, max), ...)`

  This is useful when you want to treat `field1` and `field2` in all objects in
  `class` as two different parameters. Please note the use of special notion of
  `:=` instead of `=`.

* `.(objects) := list(field1 = c(min, max), field2 = c(min, max), ...)`

  Sometimes you may not want to treat a field in all objects in a class but only
  a subset of objects. You can use a special notation on the left hand side
  `.()`. In the parentheses can be object names or IDs.

```{r}
bc$param(
    `CoolSys1 Chiller 1` = list(reference_cop = c(4, 6), reference_capacity = c(2.5e6, 3.0e6)),
    .names = c("cop1", "cap1"), .num_sim = 5
)
```

#### Getting Sample Values and Parametric Models

Parameter values can be retrieved using `$samples()`.

```{r}
bc$samples()
```

Generated `Idf`s can be retrieved using `$models()`.

```{r}
names(bc$models())
```

#### Run simulations and gather data

`$eplus_run()` runs all parametric models in parallel. Parameter `run_period`
can be given to insert a new `RunPeriod` object. In this case, all existing
`RunPeriod` objects in the seed model will be commented out.

```{r}
bc$eplus_run(dir = tempdir(), run_period = list("example", 7, 1, 7, 3), echo = FALSE)
```

`$data_sim()` returns a `data.table` (when `merge` is `TRUE`) or a list of 2
`data.table` (when `merge` is `FALSE`) which contains the simulated data of
input and output parameters. These data will be stored internally and used
during Bayesian calibration using Stan.

The `resolution` parameter can be used to specify the time resolution of
returned data. Note that input time resolution cannot be smaller than the
reporting frequency, otherwise an error will be issued.

```{r, error = TRUE}
bc$data_sim("1 min")
```

```{r}
bc$data_sim("6 hour")
```

#### Specify Measured Data

`$data_field()` takes a `data.frame` of measured value of output parameters and
returns a list of `data.table`s which contains the measured value of input and
output parameters, and newly measured value of input if applicable.

For input parameters, the values of simulation data for the first case are
directly used as the measured values.

For demonstration, we use the seed model to generate fake measured output data.

```{r}
# clone the seed model
seed <- bc$seed()$clone()
# remove existing RunPeriod objects
seed$RunPeriod <- NULL
# set run period as the same as in `$eplus_run()`
seed$add(RunPeriod = list("test", 7, 1, 7, 3))
seed$SimulationControl$set(
    `Run Simulation for Sizing Periods` = "No",
    `Run Simulation for Weather File Run Periods` = "Yes"
)
# save the model to tempdir
seed$save(tempfile(fileext = ".idf"))
# run
job <- seed$run(bc$weather(), echo = FALSE)
# get output data
fan_power <- epluspar:::report_dt_aggregate(job$report_data(name = bc$output()$variable_name, all = TRUE, day_type = "normalday"), "6 hour")
fan_power <- epluspar:::report_dt_to_wide(fan_power)
# add Gaussian noice
fan_power <- fan_power[, -"Date/Time"][
    , lapply(.SD, function (x) x + rnorm(length(x), sd = 0.05 * sd(x)))][
    , lapply(.SD, function (x) {x[x < 0] <- 0; x})
    ]

# set field data
bc$data_field(fan_power)
```

#### Specify Input Data for Stan

`$data_bc()` takes a list of field data and simulated data, and returns a
list that contains data input for Bayesian calibration using the Stan model

* `n`: Number of measured parameter observations.
* `m`: Number of simulated observations.
* `p`: Number of input parameters.
* `q`: Number of calibration parameters.
* `yf`: Data of measured output after z-score standardization using data of
  simulated output.
* `yc`: Data of simulated output after z-score standardization.
* `xf`: Data of measured input after min-max normalization.
* `xc`: Data of simulated input after min-max normalization.
* `tc`: Data of calibration parameters after min-max normalization.

```{r}
str(bc$data_bc())
```

You can also supply your own field data and simulated data and using
`$data_bc()` to construct the input for the Stan model. Input `data_field` and
`data_sim` should have the same structure as the output from `$data_field()` and
`$data_sim()`. If `data_field` and `data_sim` is not specified, the output from
`$data_field()` and `$data_sim()` will be used.

#### Get Stan file

You can save the internal Stan code using `$stan_file()`. If no path is
specified, a character vector that contains the stan code will be returned.

```{r, out.lines = 20}
bc$stan_file()
```

#### Run Bayesian Calibration Using Stan

You can run Bayesian calibration using Stan using `$stan_run()`.

If `data` argument is not specified, the output of `$data_bc()` is directly
used.

```{r, eval = FALSE}
options(mc.cores = parallel::detectCores())
bc$stan_run(iter = 300, chains = 3)
```

Instead of using builtin Stan model, you can also provide your own Stan code
using `file` argument.

```{r, eval = FALSE}
bc$stan_run(file = bc$stan_file(tempfile(fileext = ".stan")), iter = 300, chains = 3)
```

You can also use custom data set

```{r}
res <- bc$stan_run(data = bc$data_bc(), iter = 300, chains = 3)
```

The stan model is store in `fit`

```{r stan, fig.path = "man/figures/"}
rstan::stan_trace(res$fit)
rstan::stan_hist(res$fit)
```

The predicted values is stored in `y_pred`.

```{r}
str(res$y_pred)
```