Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbindlist segfault for fill=TRUE and usenames=FALSE #5468

Merged
merged 4 commits into from
Dec 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@
# v1.14.4 0.4826 0.5586 0.6586 0.6329 0.7348 1.318 100
```

31. `rbind()` and `rbindlist()` now support `fill=TRUE` with `use.names=FALSE` instead of issuing the warning `use.names= cannot be FALSE when fill is TRUE. Setting use.names=TRUE.`
31. `rbind()` and `rbindlist()` now support `fill=TRUE` with `use.names=FALSE` instead of issuing the warning `use.names= cannot be FALSE when fill is TRUE. Setting use.names=TRUE.`, [#5444](https://github.com/Rdatatable/data.table/issues/5444). Thanks to @sindribaldur for testing dev and filing a bug report which was fixed before release.

```R
DT1
Expand Down Expand Up @@ -251,7 +251,7 @@
# 3: 3 NA
# 4: 4 NA
```

32. `fread()` already made a good guess as to whether column names are present by comparing the type of the fields in row 1 to the type of the fields in the sample. This guess is now improved when a column contains a string in row 1 (i.e. a potential column name) but all blank in the sample rows, [#2526](https://github.com/Rdatatable/data.table/issues/2526). Thanks @st-pasha for reporting, and @ben-schwen for the PR.

33. `fread()` can now read `.zip` and `.tar` directly, [#3834](https://github.com/Rdatatable/data.table/issues/3834). Moreover, if a compressed file name is missing its extension, `fread()` now attempts to infer the correct filetype from its magic bytes. Thanks to Michael Chirico for the idea, and Benjamin Schwendinger for the PR.
Expand All @@ -267,7 +267,7 @@
# 1: 1 3 a
# 2: 2 4 b
```

35. `weighted.mean()` is now optimised by group, [#3977](https://github.com/Rdatatable/data.table/issues/3977). Thanks to @renkun-ken for requesting, and Benjamin Schwendinger for the PR.

36. `as.xts.data.table()` now supports non-numeric xts coredata matrixes, [5268](https://github.com/Rdatatable/data.table/issues/5268). Existing numeric only functionality is supported by a new `numeric.only` parameter, which defaults to `TRUE` for backward compatability and the most common use case. To convert non-numeric columns, set this parameter to `FALSE`. Conversions of `data.table` columns to a `matrix` now uses `data.table::as.matrix`, with all its performance benefits. Thanks to @ethanbsmith for the report and fix.
Expand All @@ -284,7 +284,7 @@
# <int> <int>
# 1: 3 5
# 2: 4 6

DT[, sum(.SD), by=.I]
# I V1
# <int> <int>
Expand Down
5 changes: 4 additions & 1 deletion inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -14694,7 +14694,10 @@ test(2003.3, rbindlist(list(data.table(a=1:2), data.table(b=3:4)), fill=TRUE, us
test(2003.4, rbindlist(list(data.table(a=1:2,c=5:6), data.table(b=3:4)), fill=TRUE, use.names=FALSE),
data.table(a=c(1:4), c=INT(5,6,NA,NA)))
test(2003.5, rbindlist(list(data.table(a=1:2), data.table(b=3:4, c=5:6)), fill=TRUE, use.names=FALSE),
data.table(a=c(1:4), V1=INT(NA,NA,5,6)))
data.table(a=c(1:4), c=INT(NA,NA,5,6)))
# rbindlist segfault with fill=TRUE and usenames=FALSE #5444
test(2003.6, rbindlist(list(list(1), list(2,3)), fill=TRUE, use.names=FALSE), data.table(c(1,2), c(NA, 3)))
test(2003.7, rbindlist(list(list(1), list(2,factor(3))), fill=TRUE, use.names=FALSE), data.table(c(1,2), factor(c(NA, 3))))

# chmatch coverage for two different non-ascii encodings matching; issues mentioned in comments in chmatch.c #69 #2538 #111
x1 = "fa\xE7ile"
Expand Down
16 changes: 9 additions & 7 deletions src/rbindlist.c
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg)
for (int i=0; i<LENGTH(l); ++i) {
SEXP li = VECTOR_ELT(l, i);
if (!length(li)) continue;
int w = usenames ? colMap[i*ncol + j] : j; // colMap tells us which item to fetch for each of the final result columns, so we can stack column-by-column
int w = usenames ? colMap[i*ncol + j] : (j<length(li) ? j : -1); // colMap tells us which item to fetch for each of the final result columns, so we can stack column-by-column // check if j exceeds length for fill=TRUE and usenames=FALSE #5444
ben-schwen marked this conversation as resolved.
Show resolved Hide resolved
if (w==-1) continue; // column j of final result has no input from this item (fill must be true)
if (!foundName) {
SEXP cn=PROTECT(getAttrib(li, R_NamesSymbol));
Expand Down Expand Up @@ -333,9 +333,10 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg)
// before the savetl_init() because we have no hook to clean up tl if coerceVector fails.
if (coercedForFactor==NULL) { coercedForFactor=PROTECT(allocVector(VECSXP, LENGTH(l))); nprotect++; }
for (int i=0; i<LENGTH(l); ++i) {
int w = usenames ? colMap[i*ncol + j] : j;
SEXP li = VECTOR_ELT(l, i);
int w = usenames ? colMap[i*ncol + j] : (j<length(li) ? j : -1); // check if j exceeds length for fill=TRUE and usenames=FALSE #5444
if (w==-1) continue;
SEXP thisCol = VECTOR_ELT(VECTOR_ELT(l, i), w);
SEXP thisCol = VECTOR_ELT(li, w);
if (!isFactor(thisCol) && !isString(thisCol)) {
SET_VECTOR_ELT(coercedForFactor, i, coerceVector(thisCol, STRSXP));
}
Expand Down Expand Up @@ -366,9 +367,10 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg)
SET_TRUELENGTH(s,-k-1);
}
for (int i=0; i<LENGTH(l); ++i) {
int w = usenames ? colMap[i*ncol + j] : j;
SEXP li = VECTOR_ELT(l, i);
int w = usenames ? colMap[i*ncol + j] : (j<length(li) ? j : -1); // check if j exceeds length for fill=TRUE and usenames=FALSE #5444
if (w==-1) continue;
SEXP thisCol = VECTOR_ELT(VECTOR_ELT(l, i), w);
SEXP thisCol = VECTOR_ELT(li, w);
if (isOrdered(thisCol)) {
SEXP levels = getAttrib(thisCol, R_LevelsSymbol);
const SEXP *levelsD = STRING_PTR(levels);
Expand Down Expand Up @@ -403,7 +405,7 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg)
const int thisnrow = eachMax[i];
SEXP li = VECTOR_ELT(l, i);
if (!length(li)) continue; // NULL items in the list() of DT/DF; not if thisnrow==0 because we need to retain (unused) factor levels (#3508)
int w = usenames ? colMap[i*ncol + j] : j;
int w = usenames ? colMap[i*ncol + j] : (j<length(li) ? j : -1); // check if j exceeds length for fill=TRUE and usenames=FALSE #5444
if (w==-1) {
writeNA(target, ansloc, thisnrow, false);
} else {
Expand Down Expand Up @@ -508,7 +510,7 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg)
const int thisnrow = eachMax[i];
if (thisnrow==0) continue;
SEXP li = VECTOR_ELT(l, i);
int w = usenames ? colMap[i*ncol + j] : j;
int w = usenames ? colMap[i*ncol + j] : (j<length(li) ? j : -1); // check if j exceeds length for fill=TRUE and usenames=FALSE #5444
SEXP thisCol;
if (w==-1 || !length(thisCol=VECTOR_ELT(li, w))) { // !length for zeroCol warning above; #1871
writeNA(target, ansloc, thisnrow, false); // writeNA is integer64 aware and writes INT64_MIN
Expand Down