Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set automatically allocates new column slots if needed #5269

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
53d5160
add re-overallocation to set
ben-schwen Nov 21, 2021
a035fbe
refine test
ben-schwen Nov 21, 2021
315eb3c
avoid double substitute by calling Calloccolwrapper directly
ben-schwen Nov 21, 2021
b1db2ab
add length check to avoid chmatch
ben-schwen Nov 21, 2021
7146873
add news
ben-schwen Nov 21, 2021
67a0218
alter old test
ben-schwen Nov 21, 2021
6b35f8e
merge master
ben-schwen Nov 21, 2021
cda7455
add coverage
ben-schwen Nov 21, 2021
b61e7ff
remove old error which is not reachable anymore
ben-schwen Nov 21, 2021
517c62a
grammar
MichaelChirico Dec 19, 2021
7469494
add issue number to tests
ben-schwen Dec 19, 2021
e9d0e9c
remove 2nd eval
ben-schwen Dec 20, 2021
ca86485
merge master
ben-schwen Dec 20, 2021
592853d
move set to c level
ben-schwen Dec 26, 2021
86dfc42
proper use of alloccol
ben-schwen Dec 26, 2021
3eebecf
update test
ben-schwen Dec 26, 2021
1aa5534
rewrite R call
ben-schwen Dec 27, 2021
249a471
fix tests
ben-schwen Dec 27, 2021
f65a8cc
exchange alloccol with shallow
ben-schwen Dec 27, 2021
f95c1c4
add test for TRUELENGTH=0 set
ben-schwen Dec 27, 2021
e4521a1
rewrite set
ben-schwen Sep 11, 2024
03833c9
Merge branch 'master' into set_re-overallocate
ben-schwen Sep 11, 2024
ad8c97d
fix typo
ben-schwen Sep 11, 2024
6fd0907
add tests
ben-schwen Sep 11, 2024
4a96f80
change test numbering
ben-schwen Sep 11, 2024
074b805
add test
ben-schwen Sep 11, 2024
26cdf8f
fix inherits
ben-schwen Sep 11, 2024
c4bdbea
Merge branch 'master' into set_re-overallocate
MichaelChirico Jan 17, 2025
c4174e7
style
MichaelChirico Jan 17, 2025
23b1c9d
bullet number
MichaelChirico Jan 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,8 @@

49. `fintersect(..., all=TRUE)` and `fsetdiff(..., all=TRUE)` could return incorrect results when the inputs had columns named `x` and `y`, [#5255](https://github.com/Rdatatable/data.table/issues/5255). Thanks @Fpadt for the report, and @ben-schwen for the fix.

50. `set()` now automatically pre-allocates new column slots if needed, similar to what `:=` already does, [#496](https://github.com/Rdatatable/data.table/issues/496) [#4100](https://github.com/Rdatatable/data.table/issues/4100). Thanks to Huashan Chen and Benjamin Tyner for the report and Benjamin Schwendinger for the fix.

## NOTES

1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.<type>()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example :
Expand Down
16 changes: 16 additions & 0 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -2707,6 +2707,22 @@ setcolorder = function(x, neworder=key(x), before=NULL, after=NULL) # before/af

set = function(x,i=NULL,j,value) # low overhead, loopable
{
if (is.character(j) && truelength(x) < ncol(x)+length(j)) {
newnames = NULL
names_x = names(x)
m = chmatch(j, names_x)
if (anyNA(m)) {
newnames = j[is.na(m)]
}
if (truelength(x) < ncol(x)+length(newnames)) {
n = length(newnames) + eval(getOption("datatable.alloccol"))
ben-schwen marked this conversation as resolved.
Show resolved Hide resolved
name = substitute(x)
x = .Call(Calloccolwrapper, x, eval(n), getOption("datatable.verbose"))
if (is.name(name)) {
assign(as.character(name),x,parent.frame(),inherits=TRUE)
}
}
}
.Call(Cassign,x,i,j,NULL,value)
invisible(x)
}
Expand Down
12 changes: 10 additions & 2 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -1265,7 +1265,8 @@ test(416, DT[J(2)], error="the columns to join by must be specified using")

# Test shallow copy verbose message from := adding a column, and (TO DO) only when X is NAMED.
DT = data.table(a=1:3,b=4:6)
test(417, alloc.col(DT,3,verbose=TRUE), DT, output="Attempt to reduce allocation from.*to 5 ignored. Can only increase allocation via shallow copy")
test(417.1, alloc.col(DT,3,verbose=TRUE), DT, output="Attempt to reduce allocation from.*to 5 ignored. Can only increase allocation via shallow copy")
test(417.2, alloc.col(DT,3,verbose=1L), error="verbose must be TRUE or FALSE")
options(datatable.alloccol=1L)
DT = data.table(a=1:3,b=4:6)
options(datatable.alloccol=1024L)
Expand Down Expand Up @@ -14784,7 +14785,7 @@ test(2016.1, name, "DT")
test(2016.2, DT, data.table(a=1:3))
test(2016.3, DT[2,a:=4L], data.table(a=INT(1,4,3))) # no error for := when existing column
test(2016.4, set(DT,3L,1L,5L), data.table(a=INT(1,4,5))) # no error for set() when existing column
test(2016.5, set(DT,2L,"newCol",5L), error="either been loaded from disk.*or constructed manually.*Please run setDT.*setalloccol.*on it first") # just set()
test(2016.5, set(DT,2L,"newCol",5L), data.table(a=INT(1,4,5), newCol=INT(NA,5L,NA))) # works since set overallocates
ben-schwen marked this conversation as resolved.
Show resolved Hide resolved
test(2016.6, DT[2,newCol:=6L], data.table(a=INT(1,4,5), newCol=INT(NA,6L,NA))) # := ok (it changes DT in caller)
unlink(tt)

Expand Down Expand Up @@ -18382,3 +18383,10 @@ DT = data.table(x=c(1,2,2,2), y=LETTERS[c(1,2,2,3)])
test(2227.1, fintersect(DT, DT, all=TRUE), DT)
test(2227.2, fsetdiff(DT, DT, all=TRUE), DT[0])

# re-overallocate in set if quota is reached #496 #4100
opt = options(datatable.alloccol=1L)
n = getOption("datatable.alloccol")
DT = data.table()
for (i in seq(2*n)) set(DT, j = paste0("V",i), value = i)
test(2228.1, ncol(DT), 2L*n)
options(opt)
3 changes: 2 additions & 1 deletion src/assign.c
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,8 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values)
oldtncol = TRUELENGTH(dt); // TO DO: oldtncol can be just called tl now, as we won't realloc here any more.

if (oldtncol<oldncol) {
if (oldtncol==0) error(_("This data.table has either been loaded from disk (e.g. using readRDS()/load()) or constructed manually (e.g. using structure()). Please run setDT() or setalloccol() on it first (to pre-allocate space for new columns) before assigning by reference to it.")); // #2996
// oldtncol=0 and oldtncol < oldncol can't occur longer because all functions which use assign (set and :=) pre-allocate column slots if needed
//if (oldtncol==0) error(_("This data.table has either been loaded from disk (e.g. using readRDS()/load()) or constructed manually (e.g. using structure()). Please run setDT() or setalloccol() on it first (to pre-allocate space for new columns) before assigning by reference to it.")); // #2996
error(_("Internal error: oldtncol(%d) < oldncol(%d). Please report to data.table issue tracker, including result of sessionInfo()."), oldtncol, oldncol); // # nocov
}
if (oldtncol>oldncol+10000L) warning(_("truelength (%d) is greater than 10,000 items over-allocated (length = %d). See ?truelength. If you didn't set the datatable.alloccol option very large, please report to data.table issue tracker including the result of sessionInfo()."),oldtncol, oldncol);
Expand Down