Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in loading packages #42

Open
arunsrinivasan opened this issue Apr 18, 2019 · 2 comments
Open

Possible bug in loading packages #42

arunsrinivasan opened this issue Apr 18, 2019 · 2 comments

Comments

@arunsrinivasan
Copy link

arunsrinivasan commented Apr 18, 2019

This was a bit hard to track down... Here goes.

data.table and bit both have setattr functions. But the annoying thing about bit:setattr is that it returns NULL. data.table::setattr returns the input object with the attribute modified invisibly. So, if you were to write a code like:

y <- setattr(18004L, "class", "Date") # today's date
# [1] "2019-04-18"

(I'm not saying this is how one should go about it, but there are other cases where we need to set an attribute and assign the result to an object.)

In this case, depending on what setattr we've, it'll return the right expected result or NULL.

With this, consider this code:

require(parallel)
require(doSNOW)
require(foreach)
require(future)
require(future.apply)
require(data.table)
foo <- function(dt) {
  cat(sprintf("[%s] %s", Sys.getpid(), capture.output(environment(setattr))), sep="\n")
  setattr(dt, "key", "val")
}
nodes <- 5L
cl <- future::makeClusterPSOCK(nodes)
plan(cluster, workers=cl, persistent=TRUE)
dt <- data.table(x=1, y=2)

ans <- values(
  lapply(seq_len(nodes), function(node) {
    future({foo(dt)}, packages=c("bit64", "data.table"))
  })
)
# [15468] <environment: namespace:data.table>
# [9808] <environment: namespace:data.table>
# [15016] <environment: namespace:data.table>
# [23496] <environment: namespace:data.table>
# [22312] <environment: namespace:data.table>

I've created a data.table in the local environment (just 1 for simplicity) and am calling a function that sets the attribute in parallel. Of course this function is terribly simplified as well.

Now, the way this works (as expected), is to first load bit64 first and data.table next and then look for functions in foo and possibly load more packages and then run foo() (AFAICT).

And this works fine as you can see from the output. If you were to check ans, you'd get the data.table with their attributes set.


Restart session (IMPORTANT). Now, with everything else remaining intact, if instead of running values(...), I use future_lapply:

ans <- future_lapply(cl, function(node) foo(dt), future.packages=c("bit64", "data.table"))
> ans <- future_lapply(cl, function(node) foo(dt), future.packages=c("bit64", "data.table"))
# [11976] <environment: namespace:bit>
# [23212] <environment: namespace:bit>
# [11732] <environment: namespace:bit>
# [22964] <environment: namespace:bit>
# [5748] <environment: namespace:bit>
> ans
# [[1]]
# NULL
# 
# [[2]]
# NULL
# 
# [[3]]
# NULL
# 
# [[4]]
# NULL
# 
# [[5]]
# NULL

Note how setattr refers to bit package.. I think this is because the packages get loaded after assessing setattr is used in foo and data.table from local env has a function called setattr and therefore gets loaded first followed by all packages in future.packages?

@HenrikBengtsson
Copy link
Collaborator

Thxs and sorry for the slow reply.

Yes, I can see how this can happen for PSOCK clusters, e.g. plan(cluster, ...) and plan(multisession). The problem is, as you suggest, that the search() path is different in the worker compared to the main R session (e.g. with plan(sequential)). What complicates it, is that something the search() path gets updated automatically by some other parallel/future call, i.e. it's not always obvious how, why, and when.

To really fix this for this parallel backend, one would need to come up with a way for each future to re-arrange the search() path on the worker(s) to be the same as that of the main R process. That's an issue to be solved in the future package.

What I think you're also reporting here is an inconsistency between future.apply and rolling your own version via plain futures. If so, I'm not sure why there is a difference, but it could be that packages to be attached are also automatically identified and has higher precedence than those you specify manually. Maybe this inconsistency can/should be fixed.

@HenrikBengtsson
Copy link
Collaborator

PS. You wanna use library() - not require(). And I don't think your example need to attach none of parallel, doSNOW, and foreach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants