Kyle Baron 2019-08-29 22:18:41
This vignette looks at options for parallelizing simulations with
mrgsolve in a platform-independent way. We utilize the future.apply
package (available on CRAN) to do this.
Your mileage may vary in terms of speedup factor. It is highly dependent on the problem you have. Also, with any method there is some overhead that needs to be taken into consideration when planning the simulations. It is very possible that your parallelized setup takes longer with the non-parallel setup.
library(dplyr)
library(mrgsolve)
mod <- mread("pk1", modlib())
library(future.apply)
Sys.setenv(R_FUTURE_FORK_ENABLE=TRUE)
Works pretty much like lapply
plan("multiprocess")
Note: with plan(multiprocess)
, you have to load the model shared
object into the process. See ?laodso
.
e <- ev(amt = 100)
end <- 24
out <- future_lapply(1:10, function(i) {
loadso(mod) ## NOTE
mod %>%
ev(e) %>%
mrgsim(end = end) %>%
mutate(i = i)
}) %>% bind_rows
head(out)
. # A tibble: 6 x 6
. ID time EV CENT CP i
. <dbl> <dbl> <dbl> <dbl> <dbl> <int>
. 1 1 0 0 0 0 1
. 2 1 0 100 0 0 1
. 3 1 1 36.8 61.4 3.07 1
. 4 1 2 13.5 81.0 4.05 1
. 5 1 3 4.98 85.4 4.27 1
. 6 1 4 1.83 84.3 4.21 1
On macos or unix systems, you can use:
plan("multicore", workers=8)
out <- future_lapply(1:10, function(i) {
mod %>%
ev(amt = 100) %>%
mrgsim() %>%
mutate(i = i)
}) %>% bind_rows
head(out)
. # A tibble: 6 x 6
. ID time EV CENT CP i
. <dbl> <dbl> <dbl> <dbl> <dbl> <int>
. 1 1 0 0 0 0 1
. 2 1 0 100 0 0 1
. 3 1 1 36.8 61.4 3.07 1
. 4 1 2 13.5 81.0 4.05 1
. 5 1 3 4.98 85.4 4.27 1
. 6 1 4 1.83 84.3 4.21 1
plan("multicore", workers=8)
system.time({
out <- future_lapply(1:2000, function(i) {
mod %>%
ev(amt = 100, ii = 24, addl = 27) %>%
mrgsim(end = 28*24, nid = 20) %>%
mutate(i = i)
}) %>% bind_rows
})
. user system elapsed
. 38.839 6.245 9.901
system.time({
out <- lapply(1:2000, function(i) {
mod %>%
ev(amt = 100, ii = 24, addl = 27) %>%
mrgsim(end = 28*24, nid = 20) %>%
mutate(i = i)
}) %>% bind_rows
})
. user system elapsed
. 20.978 1.223 22.471
options(mc.cores=8)
system.time({
out <- parallel::mclapply(1:2000, function(i) {
mod %>%
ev(amt = 100, ii = 24, addl = 27) %>%
mrgsim(end = 28*24, nid = 20) %>%
mutate(i = i)
}) %>% bind_rows
})
. user system elapsed
. 33.321 7.221 9.473
In this example, let’s simulate 3k subjects at each of 8 doses. We’ll split the data set on the dose and simulate each dose separately and then bind back together in a single data set. This is probably the quickest way to get it done. But we really need to work to see the speedup from parallelizing.
data <- expand.ev(
ID = seq(2000),
amt = c(1,3,10,30,100,300,1000,3000),
ii = 24, addl = 27
)
count(data,amt)
. # A tibble: 8 x 2
. amt n
. <dbl> <int>
. 1 1 2000
. 2 3 2000
. 3 10 2000
. 4 30 2000
. 5 100 2000
. 6 300 2000
. 7 1000 2000
. 8 3000 2000
data_split <- split(data, data$amt)
system.time({
out <- future_lapply(data_split, function(chunk) {
mod %>% mrgsim_d(chunk, end = 24*27) %>% as_tibble()
}) %>% bind_rows()
})
. user system elapsed
. 6.260 2.312 2.241
dim(out)
. [1] 10400000 5
system.time({
out <- lapply(data_split, function(chunk) {
mod %>% mrgsim_d(chunk, end = 24*27) %>% as_tibble()
}) %>% bind_rows()
})
. user system elapsed
. 3.542 0.506 4.068