Skip to content

Commit

Permalink
Benchmark computation time with parallelization.
Browse files Browse the repository at this point in the history
  • Loading branch information
ManonSimonot committed Jan 6, 2025
1 parent 554d84c commit efab186
Show file tree
Hide file tree
Showing 5 changed files with 151 additions and 3 deletions.
91 changes: 91 additions & 0 deletions R/parallelization-vignette.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
## code to prepare `parallelization-vignette` dataset goes here

# Setup -------------------------------------------------------------------

library(flipr)
library(future)
library(parallel)
library(progressr)
library(tictoc)

ngrid_in <- 50L
nperms <- 5000
n1 <- 10
set.seed(1234)
x <- rnorm(n1, mean = 1, sd = 1)
y <- rnorm(n1, mean = 4, sd = 1)

null_spec <- function(y, parameters) {
purrr::map(y, ~ .x - parameters[1])
}

stat_functions <- list(stat_t)

stat_assignments <- list(delta = 1)

# Inference on the mean without parallelization --------------------------------

pf <- PlausibilityFunction$new(
null_spec = null_spec,
stat_functions = stat_functions,
stat_assignments = stat_assignments,
x, y
)

pf$set_point_estimate(mean(y) - mean(x))

pf$set_nperms(nperms)
pf$set_parameter_bounds(
point_estimate = pf$point_estimate,
conf_level = pf$max_conf_level
)

pf$set_grid(
parameters = pf$parameters,
npoints = ngrid_in
)

tic()
pf$evaluate_grid(grid = pf$grid)
time_without_parallelization <- toc()$callback_msg

# Inference on the mean with parallelization -----------------------------------

ncores <- 4
plan(multisession, workers = ncores)
cl <- makeCluster(ncores)
setDefaultCluster(cl)
progressr::handlers(global = TRUE)

pf <- PlausibilityFunction$new(
null_spec = null_spec,
stat_functions = stat_functions,
stat_assignments = stat_assignments,
x, y
)

pf$set_point_estimate(mean(y) - mean(x))

pf$set_nperms(nperms)
pf$set_parameter_bounds(
point_estimate = pf$point_estimate,
conf_level = pf$max_conf_level
)

pf$set_grid(
parameters = pf$parameters,
npoints = ngrid_in
)

tic()
pf$evaluate_grid(grid = pf$grid)
time_with_parallelization <- toc()$callback_msg

df_parallelization <- list(
delta = pf$grid$delta,
time_par = time_with_parallelization,
time_without_par = time_without_parallelization
)

saveRDS(df_parallelization, "data-raw/df_parallelization.rds")

Binary file modified R/sysdata.rda
Binary file not shown.
2 changes: 2 additions & 0 deletions data-raw/build-sysdata.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@ df_mean <- readRDS("data-raw/df_mean.rds")
df_sd <- readRDS("data-raw/df_sd.rds")
df_fisher <- readRDS("data-raw/df_fisher.rds")
df_tippett <- readRDS("data-raw/df_tippett.rds")
df_parallelization <- readRDS("data-raw/df_parallelization.rds")
usethis::use_data(
alpha_estimates,
pfa, pfb, pfc,
df_mean, df_sd,
df_fisher, df_tippett,
df_parallelization,
overwrite = TRUE,
internal = TRUE
)
Binary file added data-raw/df_parallelization.rds
Binary file not shown.
61 changes: 58 additions & 3 deletions vignettes/parallelization.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,64 @@ knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
load("../R/sysdata.rda")
time_without_parallelization <- df_parallelization$time_without_par
time_with_parallelization <- df_parallelization$time_par
```

The [**flipr**](https://lmjl-alea.github.io/flipr/) package uses functions contained in the [**furrr**](https://future.futureverse.org/index.html) package for parallel processing. The setting of parallelization has to be done on the user side. We illustrate here how to achieve asynchronous evaluation. We use the [**future**](https://future.futureverse.org/index.html) package to set the plan, the **parallel** package to define a default cluster, and the [**progressr**](https://progressr.futureverse.org/index.html) package to report progress updates.

By setting the desired number of cores, we define the number of background R sessions that will be used to evaluate expressions in parallel. This number is used to set the multisession plan with the function `future::plan()` and to define a default cluster with `parallel::setDefaultCluster()`. Then, to enable the visualization of evaluation progress, we can put the code in the `progressr::with_progress()` function, or more simply set it for all the following code with the `progressr::handlers()` function. After these settings, [**flipr**](https://lmjl-alea.github.io/flipr/) functions can be used, as shown in this example.

```{r setup}
library(flipr)
```

## Computation without parallel processing

To show the benefit of parallel processing, we compare here the processing times necessary to evaluate a grid with a plausibility function. First, here is the computation without parallelization.

```{r, eval=FALSE}
set.seed(1234)
x <- rnorm(10, 1, 1)
y <- rnorm(10, 4, 1)
null_spec <- function(y, parameters) {
purrr::map(y, ~ .x - parameters[1])
}
stat_functions <- list(stat_t)
stat_assignments <- list(delta = 1)
pf <- PlausibilityFunction$new(
null_spec = null_spec,
stat_functions = stat_functions,
stat_assignments = stat_assignments,
x, y
)
pf$set_point_estimate(mean(y) - mean(x), overwrite = TRUE)
pf$set_parameter_bounds(
point_estimate = pf$point_estimate,
conf_level = pf$max_conf_level
)
pf$set_grid(
parameters = pf$parameters,
npoints = 50L
)
tictoc::tic()
pf$evaluate_grid(grid = pf$grid)
time_without_parallelization <- tictoc::toc()
```

```{r}
time_without_parallelization
```

## Computation with parallel processing

By setting the desired number of cores, we define the number of background R sessions that will be used to evaluate expressions in parallel. This number is used to set the multisession plan with the function `future::plan()` and to define a default cluster with `parallel::setDefaultCluster()`. Then, to enable the visualization of evaluation progress, we can put the code in the `progressr::with_progress()` function, or more simply set it for all the following code with the `progressr::handlers()` function. After these settings, [**flipr**](https://lmjl-alea.github.io/flipr/) functions can be used, as shown in this example.

```{r, eval=FALSE}
ncores <- 3
ncores <- 4
future::plan(multisession, workers = ncores)
cl <- parallel::makeCluster(ncores)
parallel::setDefaultCluster(cl)
Expand Down Expand Up @@ -55,13 +101,22 @@ pf$set_grid(
parameters = pf$parameters,
npoints = 50L
)
tictoc::tic()
pf$evaluate_grid(grid = pf$grid)
time_with_parallelization <- tictoc::toc()
parallel::stopCluster(cl)
```

It is good practice to shut down the workers with the `parallel::stopCluster()` function at the end of the code.

```{r}
time_with_parallelization
```

This experiment proves that we can save a lot of computation time when using parallel processing, as we gained approximately 33 seconds in this example to evaluate the plausibility function.

Finally, to return to a sequential plan with no progress updates, the following code can be used.

```{r, eval=FALSE}
Expand Down

0 comments on commit efab186

Please sign in to comment.