Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken behaviour after [[<-; segfaults on CRAN version #6734

Open
mb706 opened this issue Jan 18, 2025 · 2 comments
Open

Broken behaviour after [[<-; segfaults on CRAN version #6734

mb706 opened this issue Jan 18, 2025 · 2 comments

Comments

@mb706
Copy link

mb706 commented Jan 18, 2025

Adding or changing a column using [[<- seems to bring a data.table into a broken state. set() fails afterwards, and printing the data.table changes its truelength().

The following is with GitHub HEAD. Some versions seem to segfault here, so don't run this if you want to keep your R session:

library("data.table")
#> data.table 1.16.99 IN DEVELOPMENT built 2025-01-18 17:55:52 UTC; user using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com
dt <- data.table(a = 1, b = 2)
dt[[2]] <- 2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : 
#>   It appears that at some earlier point, attributes of this data.table have been reassigned. Please use setattr(DT, name, value) rather than attr(DT, name) <- value. If that doesn't apply to you, please report your case to the data.table issue tracker.
truelength(dt)
#> [1] 1026
dt
#>        a     b
#>    <num> <num>
#> 1:     1     2
truelength(dt)  # this changed after printing for some reason
#> [1] 2
> set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : 
#>   Internal error in assign: input dt has not been allocated enough column slots. l=2, tl=2, adding 1. Please report to the data.table issues tracker.

Using using a column name instead of an index in [[ also fails:

dt <- data.table(a = 1, b = 2)
dt[["b"]] <- 2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : 
#>   It appears that at some earlier point, attributes of this data.table have been reassigned. Please use setattr(DT, name, value) rather than attr(DT, name) <- value. If that doesn't apply to you, please report your case to the data.table issue tracker.

Adding a new column, instead of changing an existing one, gives a different error message:

dt <- data.table(a = 1, b = 2)
dt[[3]] <- 2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : 
#>   This data.table has either been loaded from disk (e.g. using readRDS()/load()) or constructed manually (e.g. using structure()). Please run setDT() or setalloccol() on it first (to pre-allocate space for new columns) before assigning by reference to it.
dt
#>        a     b    V3
#>    <num> <num> <num>
#> 1:     1     2     2
set(dt, , "c", 1)
#> Error in set(dt, , "c", 1) : 
#>   Internal error in assign: input dt has not been allocated enough column slots. l=3, tl=3, adding 1. Please report to the data.table issues tracker.

When using CRAN data.table, instead of the current GitHub version, the first two examples cause segfaults:

library("data.table")
#> data.table 1.16.4 using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com
dt <- data.table(a = 1, b = 2)
dt[[2]] <- 2
set(dt, , "c", 1)
#> 
#>  *** caught segfault ***
#> address 0x140000014, cause 'memory not mapped'
#> 
#> Traceback:
#>  1: set(dt, , "c", 1)
#> 
#> Possible actions:
#> 1: abort (with core dump, if enabled)
#> 2: normal R exit
#> 3: exit R without saving workspace
#> 4: exit R saving workspace

Maybe the operation corrupts memory in some way and that the new GitHub version just happens to not segfault because memory is laid out differently for some unrelated reason? Maybe there are two underlying bugs, one of which is already fixed on GitHub.

Using $ does not seem to cause this issues:

dt <- data.table(a = 1, b = 2)
dt$b <- 2
set(dt, , "c", 1)
dt
#>        a     b     c
#>    <num> <num> <num>
#> 1:     1     2     1

sessionInfo():

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.4.0

(or GitHub data.table version in the first example)

@aitap
Copy link
Contributor

aitap commented Jan 19, 2025

We've fixed #6410 in September, but I'm not seeing the fix on the CRAN branch.

truelength() changing after printing makes sense: data.table:::print.data.table probably causes selfrefok() to be called on the table at some point, which notices the shallow duplicate and adjusts the field.

$ working also makes sense because we've got a method for that, data.table:::`$<-.data.table`.

What the error messages aren't telling you is how to get out of the situation. Since the current dt is broken, you need setDT(dt) to create it anew. dt <- as.data.table(dt) also seems to work. It may be worth changing the messages in this direction, cf. #6494.

Edit: is there a way to mark #6616 so that fixing it will fix this issue automatically? Since the next version, 1.17.0, will be released from the master branch, it will include the fix for #6410.

@MichaelChirico
Copy link
Member

is there a way to mark #6616 so that fixing it will fix this issue automatically?

There might be a way to do that under the new revamp of Issues GH is working on... that said, our practice is to just close an issue once it's closed in dev.

Leaving this open as it looks like maybe it's still useful as a documentation issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants