You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding or changing a column using [[<- seems to bring a data.table into a broken state. set() fails afterwards, and printing the data.table changes its truelength().
The following is with GitHub HEAD. Some versions seem to segfault here, so don't run this if you want to keep your R session:
library("data.table")
#> data.table 1.16.99 IN DEVELOPMENT built 2025-01-18 17:55:52 UTC; user using 4 threads (see ?getDTthreads). Latest news: r-datatable.comdt<- data.table(a=1, b=2)
dt[[2]] <-2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : #> It appears that at some earlier point, attributes of this data.table have been reassigned. Please use setattr(DT, name, value) rather than attr(DT, name) <- value. If that doesn't apply to you, please report your case to the data.table issue tracker.
truelength(dt)
#> [1] 1026dt#> a b#> <num> <num>#> 1: 1 2
truelength(dt) # this changed after printing for some reason#> [1] 2> set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : #> Internal error in assign: input dt has not been allocated enough column slots. l=2, tl=2, adding 1. Please report to the data.table issues tracker.
Using using a column name instead of an index in [[ also fails:
dt<- data.table(a=1, b=2)
dt[["b"]] <-2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : #> It appears that at some earlier point, attributes of this data.table have been reassigned. Please use setattr(DT, name, value) rather than attr(DT, name) <- value. If that doesn't apply to you, please report your case to the data.table issue tracker.
Adding a new column, instead of changing an existing one, gives a different error message:
dt<- data.table(a=1, b=2)
dt[[3]] <-2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : #> This data.table has either been loaded from disk (e.g. using readRDS()/load()) or constructed manually (e.g. using structure()). Please run setDT() or setalloccol() on it first (to pre-allocate space for new columns) before assigning by reference to it.dt#> a b V3#> <num> <num> <num>#> 1: 1 2 2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : #> Internal error in assign: input dt has not been allocated enough column slots. l=3, tl=3, adding 1. Please report to the data.table issues tracker.
When using CRAN data.table, instead of the current GitHub version, the first two examples cause segfaults:
library("data.table")
#> data.table 1.16.4 using 4 threads (see ?getDTthreads). Latest news: r-datatable.comdt<- data.table(a=1, b=2)
dt[[2]] <-2
set(dt, , "c", 1)
#> #> *** caught segfault ***#> address 0x140000014, cause 'memory not mapped'#> #> Traceback:#> 1: set(dt, , "c", 1)#> #> Possible actions:#> 1: abort (with core dump, if enabled)#> 2: normal R exit#> 3: exit R without saving workspace#> 4: exit R saving workspace
Maybe the operation corrupts memory in some way and that the new GitHub version just happens to not segfault because memory is laid out differently for some unrelated reason? Maybe there are two underlying bugs, one of which is already fixed on GitHub.
Using $ does not seem to cause this issues:
dt<- data.table(a=1, b=2)
dt$b<-2
set(dt, , "c", 1)
dt#> a b c#> <num> <num> <num>#> 1: 1 2 1
sessionInfo():
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Berlin
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.4.0
(or GitHub data.table version in the first example)
The text was updated successfully, but these errors were encountered:
We've fixed #6410 in September, but I'm not seeing the fix on the CRAN branch.
truelength() changing after printing makes sense: data.table:::print.data.table probably causes selfrefok() to be called on the table at some point, which notices the shallow duplicate and adjusts the field.
$ working also makes sense because we've got a method for that, data.table:::`$<-.data.table`.
What the error messages aren't telling you is how to get out of the situation. Since the current dt is broken, you need setDT(dt) to create it anew. dt <- as.data.table(dt) also seems to work. It may be worth changing the messages in this direction, cf. #6494.
Edit: is there a way to mark #6616 so that fixing it will fix this issue automatically? Since the next version, 1.17.0, will be released from the master branch, it will include the fix for #6410.
is there a way to mark #6616 so that fixing it will fix this issue automatically?
There might be a way to do that under the new revamp of Issues GH is working on... that said, our practice is to just close an issue once it's closed in dev.
Leaving this open as it looks like maybe it's still useful as a documentation issue.
Adding or changing a column using
[[<-
seems to bring adata.table
into a broken state.set()
fails afterwards, and printing thedata.table
changes itstruelength()
.The following is with GitHub HEAD. Some versions seem to segfault here, so don't run this if you want to keep your R session:
Using using a column name instead of an index in
[[
also fails:Adding a new column, instead of changing an existing one, gives a different error message:
When using CRAN
data.table
, instead of the current GitHub version, the first two examples cause segfaults:Maybe the operation corrupts memory in some way and that the new GitHub version just happens to not segfault because memory is laid out differently for some unrelated reason? Maybe there are two underlying bugs, one of which is already fixed on GitHub.
Using
$
does not seem to cause this issues:sessionInfo()
:(or GitHub
data.table
version in the first example)The text was updated successfully, but these errors were encountered: