-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.13.0 slow down in a repeated loop on list column #4658
Comments
Probably a duplicate of #4646, and see also #4655 . Notes to @sandoronodi
|
@sandoronodi thanks for the report and I can reproduce. It is very related to #4646. I tried the #4655 workaround and it seems to address performance. Could you see if it solves your actual use case as well? remotes::install_github("https://github.com/Rdatatable/data.table", ref = "extract_performance") Also, and I am not sure if this is necessarily a more readable approach, but this is faster method to get the same result. You could probably tweak it more if this is a bottleneck, but this would allow you to move forward with 1.13.0. library(data.table)
dt <- data.table('id'=1:20000,
'list_col'=sample(c('', '', 'a', 'a:b', 'a:b:c'), 20000, TRUE))
feature <- 'list_col'
dt[, {
x = get(feature)
l = strsplit(x, ":")
lens = lengths(l)
lens[lens == 0L] = 1L ##for those without matches, we'll still have `list_col_` for each row based on OP. Therefore, we need those rows.
partial_text = paste0(feature, "_")
list(id = rep(id, lens),
feature_names = unlist(Map(function(y) if (length(y)) paste0(partial_text, y) else partial_text, l), use.names = FALSE)
)}
]
## A tibble: 2 x 13
## expression min median `itr/sec` mem_alloc
## <bch:expr> <bch> <bch:t> <dbl> <bch:byt>
##1 potential_solution 50ms 56.5ms 18.0 1.33MB
##2 OP_extract_perf_branch 156ms 158.3ms 6.24 2MB |
I have noticed a huge performance drop in data.table loop operations, possibly due to the new version upgrade
data.table 1.12.8, default settings, using 6 threads:
data.table 1.13.0, default settings, using 6 threads:
Also, I have tried several different threads and throttle combinations, but have seen no improvements at all
The text was updated successfully, but these errors were encountered: