You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's a bug in data.table v1.13.0 that causes unnesting to be slow if you use the [[1]] method. The issue is still open here, along with the current workaround (which I used below).
Want me to open a PR using the workaround?
library(data.table)
n<-500Ln_nested<-40Ltest_df<- data.table(
id= seq_len(n),
value= replicate(n, data.table(val1= sample(n_nested)), simplify=FALSE))
updated_unnest<-function(dt_, col){
if (isFALSE(is.data.table(dt_)))
dt_<- as.data.table(dt_)
# col to unnestcol<- substitute(col)
if (length(col) >1)
stop("dt_unnest() currently can only unnest a single column at a time", call.=FALSE)
# Get the others variables in therenames<- colnames(dt_)
if(!paste(col) %in%names)
stop("Could not find `cols` in data.table", call.=FALSE)
others<-names[-match(paste(col), names)]
others_class= sapply(others, function(x) class(dt_[[x]])[1L])
others=others[!others_class%in% c("list", "data.table", "data.frame", "tbl_df")]
# Join them all togetherdt_[seq_len(.N), eval(col)[[1L]], by=others][dt_, on=others]
}
bench::mark(current=tidyfast::dt_unnest(test_df, value),
updated= updated_unnest(test_df, value),
check=FALSE, iterations=30)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.#> # A tibble: 2 x 6#> expression min median `itr/sec` mem_alloc `gc/sec`#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>#> 1 current 137.7ms 157.9ms 5.87 59.64MB 7.82#> 2 updated 3.15ms 3.59ms 273. 1.54MB 0
The text was updated successfully, but these errors were encountered:
Thank you! I thought I had seen something about that. It wasn't the speed up I was hoping for so I'm glad there is a temporary fix on that. As always, you are a fantastic collaborator.
Hey @markfairbanks I just added this one since I was already doing some code cleaning so don't worry about a PR here. But if you have any thoughts on #25 let me know!
There's a bug in data.table v1.13.0 that causes unnesting to be slow if you use the
[[1]]
method. The issue is still open here, along with the current workaround (which I used below).Want me to open a PR using the workaround?
The text was updated successfully, but these errors were encountered: