Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Correlated RNG with nested future.apply calls #108

Open
HenrikBengtsson opened this issue Mar 8, 2023 · 3 comments
Open

BUG: Correlated RNG with nested future.apply calls #108

HenrikBengtsson opened this issue Mar 8, 2023 · 3 comments

Comments

@HenrikBengtsson
Copy link
Collaborator

Issue

library(future.apply)
y <- do.call(rbind, future_lapply(1:3, future.seed = TRUE, FUN = function(i) { 
  do.call(rbind, future_lapply(1:3, future.seed = TRUE, FUN = function(j) {
    data.frame(i = i, j = j, random = runif(n = 1L)) })) 
  })
)
print(y)

gives

  i j    random
1 1 1 0.8146860
2 1 2 0.4950540
3 1 3 0.9308272
4 2 1 0.4950540
5 2 2 0.9308272
6 2 3 0.2019456
7 3 1 0.9308272
8 3 2 0.2019456
9 3 3 0.6057787

Note how some of the random numbers are duplicated, e.g.

> y$random
[1] 0.8146860 0.4950540 0.9308272 0.4950540 0.9308272 0.2019456 0.9308272
[8] 0.2019456 0.6057787
> unique(y$random)
[1] 0.8146860 0.4950540 0.9308272 0.2019456 0.6057787

Troubleshooting

It could be that we've been here before; this seems familiar. I don't have time to investigate in full right now, but it's not that the RNG state of the parent isn't forwarded;

> seed0 <- .Random.seed
> y <- future_lapply(1:3, FUN = function(i) runif(n = 1L), future.seed = TRUE)
> seed <- .Random.seed
> identical(seed, seed0)
[1] FALSE

but it could be that it's only forwarded a single step, whereas it needs to be forward length(X) steps.

@shikokuchuo
Copy link

It is a symptom that the same .Random.seed is being passed from the parent to the child.

To demonstrate, the following works:

library(future.apply)
y <- do.call(rbind, future_lapply(1:3, future.seed = TRUE, FUN = function(i) { 
  runif(i); do.call(rbind, future_lapply(1:3, future.seed = TRUE, FUN = function(j) {
    data.frame(i = i, j = j, random = runif(n = 1L)) })) 
  })
)
print(y)

e.g.

  i j      random
1 1 1 0.488524507
2 1 2 0.301493818
3 1 3 0.711252253
4 2 1 0.785953781
5 2 2 0.342492709
6 2 3 0.001319211
7 3 1 0.037554119
8 3 2 0.792390977
9 3 3 0.179989634

Troubleshooting the original reprex:

y <- do.call(rbind, future_lapply(1:3, future.seed = TRUE, FUN = function(i) { 
    do.call(rbind, future_lapply(1:3, future.seed = TRUE, FUN = function(j) {
        data.frame(i = i, j = j, random = nanonext::sha256(.GlobalEnv[[".Random.seed"]])) })) 
})
)
print(y)
  i j                                                           random
1 1 1 e6bdf29b80b60f447d7fa1cb6a8c53956b03984ee7236f25dc85098afd2110bb
2 1 2 2785037b99ba71a8365e898fd3958ecf3c776cf3fee62a7d09703735300cc479
3 1 3 4f18d9c62ae9478d8292d19f8085c4438651b89b519f23a279bf9b75560f8e2b
4 2 1 2785037b99ba71a8365e898fd3958ecf3c776cf3fee62a7d09703735300cc479
5 2 2 4f18d9c62ae9478d8292d19f8085c4438651b89b519f23a279bf9b75560f8e2b
6 2 3 3e60d6d86fe7e7e87eb3249dd5d239238d9ae107da1df7a5e00486794f2d4a83
7 3 1 4f18d9c62ae9478d8292d19f8085c4438651b89b519f23a279bf9b75560f8e2b
8 3 2 3e60d6d86fe7e7e87eb3249dd5d239238d9ae107da1df7a5e00486794f2d4a83
9 3 3 bf7a684500c64c073c8f721d0e46a5ff54e97ec19fa6c775e037bb1244f70f0a

Note the repetition. A sign that the RNG state is not being advanced when the child seeds are generated, hence you are getting the same ones.

@shikokuchuo
Copy link

Adding to the above:

As i advances, the sequence of j does advance by 1 each time.
i.e. (1,3) (2, 2) and (3,1) are all 4f18d9c62ae9478d8292d19f8085c4438651b89b519f23a279bf9b75560f8e2b

This seems to imply that each child has received the same L'Ecuyer-CMRG stream but advanced one each time (e.g. by calling a stats function such as runif(1L)), instead of independent streams generated recursively by parallel::nextRNGStream().

shikokuchuo added a commit to shikokuchuo/future.apply that referenced this issue Sep 6, 2023
@shikokuchuo
Copy link

I believe I've found the culprit - as the child processes are already using L'Ecuyer-CMRG as the RNGkind, new streams are not created when they create their own child processes as your as_lecyer_cmrg_seed() function will just return the existing seed.

I'll open up a PR, which fixes this issue. I've also commented out the relevant test as it is no longer applicable, and it passes all others.

Whether this is the correct thing to do though I have to leave to you as I am completely new to your code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants