diff --git a/NEWS.md b/NEWS.md index 176999e02..35f51f66e 100644 --- a/NEWS.md +++ b/NEWS.md @@ -96,7 +96,7 @@ 16. `as.data.table` now unpacks columns in a `data.frame` which are themselves a `data.frame`. This need arises when parsing JSON, a corollary in [#3369](https://github.com/Rdatatable/data.table/issues/3369#issuecomment-462662752). `data.table` does not allow columns to be objects which themselves have columns (such as `matrix` and `data.frame`), unlike `data.frame` which does. Bug fix 19 in v1.12.2 (see below) added a helpful error (rather than segfault) to detect such invalid `data.table`, and promised that `as.data.table()` would unpack these columns in the next release (i.e. this release) so that the invalid `data.table` is not created in the first place. -17. `CJ` has been ported to C and parallelized, thanks to a PR by Michael Chirico, [#3596](https://github.com/Rdatatable/data.table/pull/3596). All types benefit (including newly supported complex, part of [#3690](https://github.com/Rdatatable/data.table/issues/3690)), and as in many `data.table` operations, factors benefit more than character. +17. `CJ` has been ported to C and parallelized, thanks to a PR by Michael Chirico, [#3596](https://github.com/Rdatatable/data.table/pull/3596). All types benefit, but, as in many `data.table` operations, factors benefit more than character. ```R # default 4 threads on a laptop with 16GB RAM and 8 logical CPU @@ -114,7 +114,7 @@ # 0.357 0.763 0.292 # now ``` -18. New function `coalesce(...)` has been written in C, and is multithreaded for numeric, complex, and factor types. It replaces missing values according to a prioritized list of candidates (as per SQL COALESCE, `dplyr::coalesce`, and `hutils::coalesce`), [#3424](https://github.com/Rdatatable/data.table/issues/3424). It accepts any number of vectors in several forms. For example, given three vectors `x`, `y`, and `z`, where each `NA` in `x` is to be replaced by the corresponding value in `y` if that is non-NA, else the corresponding value in `z`, the following equivalent forms are all accepted: `coalesce(x,y,z)`, `coalesce(x,list(y,z))`, and `coalesce(list(x,y,z))`. +18. New function `coalesce(...)` has been written in C, and is multithreaded for `numeric` and `factor`. It replaces missing values according to a prioritized list of candidates (as per SQL COALESCE, `dplyr::coalesce`, and `hutils::coalesce`), [#3424](https://github.com/Rdatatable/data.table/issues/3424). It accepts any number of vectors in several forms. For example, given three vectors `x`, `y`, and `z`, where each `NA` in `x` is to be replaced by the corresponding value in `y` if that is non-NA, else the corresponding value in `z`, the following equivalent forms are all accepted: `coalesce(x,y,z)`, `coalesce(x,list(y,z))`, and `coalesce(list(x,y,z))`. ```R # default 4 threads on a laptop with 16GB RAM and 8 logical CPU @@ -131,9 +131,7 @@ # TRUE ``` -19. `shift` now supports type `complex`, part of [#3690](https://github.com/Rdatatable/data.table/issues/3690). - -20. `setkey` now supports type `complex` as value columns (not as key columns), [#1444](https://github.com/Rdatatable/data.table/issues/1444). Thanks Gareth Ward for the report. +19. Type `complex` is now supported by `setkey`, `setorder`, `:=`, `by=`, `keyby=`, `shift`, `dcast`, `frank`, `rowid`, `rleid`, `CJ`, `coalesce`, `unique`, and `uniqueN`, [#3690](https://github.com/Rdatatable/data.table/issues/3690). Thanks to Gareth Ward and Elio Campitelli for their reports and input. Sorting `complex` is achieved the same way as base R; i.e., first by the real part then by the imaginary part (as if the `complex` column were two separate columns of `double`). There is no plan to support joining/merging on `complex` columns until a user demonstrates a need for that. #### BUG FIXES @@ -198,8 +196,6 @@ 24. `column not found` could incorrectly occur in rare non-equi-join cases, [#3635](https://github.com/Rdatatable/data.table/issues/3635). Thanks to @UweBlock for the report. -25. Complex columns used in `j` during grouping would get mangled, [#3639](https://github.com/Rdatatable/data.table/issues/3639). A related bug prevented assigning complex values using `:=` except for full-column plonks. We still do not support grouping `by` a complex column. Thanks to @eliocamp for filing the bug report. - #### NOTES 1. `rbindlist`'s `use.names="check"` now emits its message for automatic column names (`"V[0-9]+"`) too, [#3484](https://github.com/Rdatatable/data.table/pull/3484). See news item 5 of v1.12.2 below. diff --git a/R/bmerge.R b/R/bmerge.R index a655f99d3..321a27074 100644 --- a/R/bmerge.R +++ b/R/bmerge.R @@ -14,7 +14,7 @@ bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbos # careful to only plonk syntax (full column) on i/x from now on otherwise user's i and x would change; # this is why shallow() is very importantly internal only, currently. - supported = c("logical", "integer", "double", "character", "factor", "integer64") + supported = c(ORDERING_TYPES, "factor", "integer64") getClass = function(x) { ans = typeof(x) diff --git a/R/data.table.R b/R/data.table.R index fea514aee..a7e085a5f 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -829,7 +829,7 @@ replace_order = function(isub, verbose, env) { if (!is.list(byval)) stop("'by' or 'keyby' must evaluate to a vector or a list of vectors (where 'list' includes data.table and data.frame which are lists, too)") if (length(byval)==1L && is.null(byval[[1L]])) bynull=TRUE #3530 when by=(function()NULL)() if (!bynull) for (jj in seq_len(length(byval))) { - if (!typeof(byval[[jj]]) %chin% c("integer","logical","character","double")) stop("column or expression ",jj," of 'by' or 'keyby' is type ",typeof(byval[[jj]]),". Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]") + if (!typeof(byval[[jj]]) %chin% ORDERING_TYPES) stop("column or expression ",jj," of 'by' or 'keyby' is type ",typeof(byval[[jj]]),". Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]") } tt = vapply_1i(byval,length) if (any(tt!=xnrow)) stop("The items in the 'by' or 'keyby' list are length (",paste(tt,collapse=","),"). Each must be length ", xnrow, "; the same length as there are rows in x (after subsetting if i is provided).") diff --git a/R/setkey.R b/R/setkey.R index 838c19eee..9764e7fc6 100644 --- a/R/setkey.R +++ b/R/setkey.R @@ -51,14 +51,9 @@ setkeyv = function(x, cols, verbose=getOption("datatable.verbose"), physical=TRU } if (identical(cols,"")) stop("cols is the empty string. Use NULL to remove the key.") if (!all(nzchar(cols))) stop("cols contains some blanks.") - if (!length(cols)) { - cols = colnames(x) # All columns in the data.table, usually a few when used in this form - } else { - # remove backticks from cols - cols = gsub("`", "", cols, fixed = TRUE) - miss = !(cols %chin% colnames(x)) - if (any(miss)) stop("some columns are not in the data.table: ", paste(cols[miss], collapse=",")) - } + cols = gsub("`", "", cols, fixed = TRUE) + miss = !(cols %chin% colnames(x)) + if (any(miss)) stop("some columns are not in the data.table: ", paste(cols[miss], collapse=",")) ## determine, whether key is already present: if (identical(key(x),cols)) { @@ -83,7 +78,7 @@ setkeyv = function(x, cols, verbose=getOption("datatable.verbose"), physical=TRU if (".xi" %chin% names(x)) stop("x contains a column called '.xi'. Conflicts with internal use by data.table.") for (i in cols) { .xi = x[[i]] # [[ is copy on write, otherwise checking type would be copying each column - if (!typeof(.xi) %chin% c("integer","logical","character","double")) stop("Column '",i,"' is type '",typeof(.xi),"' which is not supported as a key column type, currently.") + if (!typeof(.xi) %chin% ORDERING_TYPES) stop("Column '",i,"' is type '",typeof(.xi),"' which is not supported as a key column type, currently.") } if (!is.character(cols) || length(cols)<1L) stop("Internal error. 'cols' should be character at this point in setkey; please report.") # nocov @@ -178,6 +173,7 @@ is.sorted = function(x, by=seq_along(x)) { # Important to call forder.c::fsorted here, for consistent character ordering and numeric/integer64 twiddling. } +ORDERING_TYPES = c('logical', 'integer', 'double', 'complex', 'character') forderv = function(x, by=seq_along(x), retGrp=FALSE, sort=TRUE, order=1L, na.last=FALSE) { if (!(sort || retGrp)) stop("At least one of retGrp or sort must be TRUE") @@ -205,7 +201,7 @@ forderv = function(x, by=seq_along(x), retGrp=FALSE, sort=TRUE, order=1L, na.las stop("'by' is type 'double' and one or more items in it are not whole integers") } by = as.integer(by) - if ( (length(order) != 1L && length(order) != length(by)) || any(!order %in% c(1L, -1L)) ) + if ( (length(order) != 1L && length(order) != length(by)) || !all(order %in% c(1L, -1L)) ) stop("x is a list, length(order) must be either =1 or =length(by) and each value should be 1 or -1 for each column in 'by', corresponding to ascending or descending order, respectively. If length(order) == 1, it will be recycled to length(by).") if (length(order) == 1L) order = rep(order, length(by)) } @@ -327,7 +323,7 @@ setorderv = function(x, cols = colnames(x), order=1L, na.last=FALSE) if (".xi" %chin% colnames(x)) stop("x contains a column called '.xi'. Conflicts with internal use by data.table.") for (i in cols) { .xi = x[[i]] # [[ is copy on write, otherwise checking type would be copying each column - if (!typeof(.xi) %chin% c("integer","logical","character","double")) stop("Column '",i,"' is type '",typeof(.xi),"' which is not supported for ordering currently.") + if (!typeof(.xi) %chin% ORDERING_TYPES) stop("Column '",i,"' is type '",typeof(.xi),"' which is not supported for ordering currently.") } if (!is.character(cols) || length(cols)<1L) stop("Internal error. 'cols' should be character at this point in setkey; please report.") # nocov diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index a6781a6b6..5f149ab84 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -6460,7 +6460,7 @@ test(1464.03, rleidv(DT, "b"), c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 5L)) test(1464.04, rleid(DT$b), c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 5L)) test(1464.05, rleidv(DT, "c"), c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 5L)) test(1464.06, rleid(DT$c), c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 5L)) -test(1464.07, rleid(as.complex(c(1,0+5i,0+5i,1))), error="Type 'complex' not supported") +test(1464.07, rleid(as.raw(c(3L, 1L, 2L))), error="Type 'raw' not supported") test(1464.08, rleidv(DT, 0), error="outside range") test(1464.09, rleidv(DT, 5), error="outside range") test(1464.10, rleidv(DT, 1:4), 1:nrow(DT)) @@ -11713,11 +11713,11 @@ test(1844.2, forder(DT,V1,V2,na.last=NA), INT(2,1,3,0,4)) # prior to v1.12.0 th # now with two NAs in that 2-group covers forder.c:forder line 1269 starting: else if (nalast == 0 && tmp==-2) { DT = data.table(c("a","a","a","b","b"),c(2,1,3,NA,NA)) test(1844.3, forder(DT,V1,V2,na.last=NA), INT(2,1,3,0,0)) -DT = data.table((0+0i)^(-3:3), 7:1) -test(1844.4, forder(DT,V1,V2), error="Column 1 of by= (1) is type 'complex', not yet supported") -test(1844.5, forder(DT,V2,V1), error="Column 2 of by= (2) is type 'complex', not yet supported") -DT = data.table((0+0i)^(-3:3), c(5L,5L,1L,2L,2L,2L,2L)) -test(1844.6, forder(DT,V2,V1), error="Column 2 of by= (2) is type 'complex', not yet supported") +DT = data.table(as.raw(0:6), 7:1) +test(1844.4, forder(DT,V1,V2), error="Column 1 of by= (1) is type 'raw', not yet supported") +test(1844.5, forder(DT,V2,V1), error="Column 2 of by= (2) is type 'raw', not yet supported") +DT = data.table(as.raw(0:6), c(5L,5L,1L,2L,2L,2L,2L)) +test(1844.6, forder(DT,V2,V1), error="Column 2 of by= (2) is type 'raw', not yet supported") # fix for non-equi joins issue #1991. Thanks to Henrik for the nice minimal example. d1 <- data.table(x = c(rep(c("b", "a", "c"), each = 3), c("a", "b")), y = c(rep(c(1, 3, 6), 3), 6, 6), id = 1:11) @@ -13170,9 +13170,9 @@ setnames(DT, '.xi') setkey(DT, NULL) test(1962.037, setkey(DT, .xi), error = "x contains a column called '.xi'") -DT = data.table(a = 1+3i) +DT = data.table(a = as.raw(0)) test(1962.038, setkey(DT, a), - error = "Column 'a' is type 'complex'") + error = "Column 'a' is type 'raw'") test(1962.039, is.sorted(3:1, by = 'x'), error = 'x is vector but') @@ -13228,8 +13228,8 @@ test(1962.064, setorderv(copy(DT)), test(1962.065, setorderv(DT, 'c'), error = 'some columns are not in the data.table') setnames(DT, 1L, '.xi') test(1962.066, setorderv(DT, 'b'), error = "x contains a column called '.xi'") -test(1962.067, setorderv(data.table(a = 1+3i), 'a'), - error = "Column 'a' is type 'complex'") +test(1962.067, setorderv(data.table(a = as.raw(0)), 'a'), + error = "Column 'a' is type 'raw'") DT = data.table( color = c("yellow", "red", "green", "red", "green", "red", @@ -13754,7 +13754,7 @@ test(1984.05, DT[ , sum(b), keyby = c, verbose = TRUE], ### hitting byval = eval(bysub, setattr(as.list(seq_along(xss)), ...) test(1984.06, DT[1:3, sum(a), by=b:c], data.table(b=10:8, c=1:3, V1=1:3)) test(1984.07, DT[, sum(a), by=call('sin',pi)], error='must evaluate to a vector or a list of vectors') -test(1984.08, DT[, sum(a), by=1+3i], error='column or expression.*type complex') +test(1984.08, DT[, sum(a), by=as.raw(0)], error='column or expression.*type raw') test(1984.09, DT[, sum(a), by=.(1,1:2)], error='The items.*list are length [(]1,2[)].*Each must be length 10; .*rows in x.*after subsetting') options('datatable.optimize' = Inf) test(1984.10, DT[ , 1, by = .(a %% 2), verbose = TRUE], @@ -14766,14 +14766,14 @@ dt1 <- data.table(int = 1L:10L, bool = c(rep(FALSE, 9), TRUE), char = letters[1L:10L], fact = factor(letters[1L:10L]), - complex = as.complex(1:5)) + raw = as.raw(1:5)) dt2 <- data.table(int = 1L:5L, doubleInt = as.numeric(1:5), realDouble = seq(0.5, 2.5, by = 0.5), bool = TRUE, char = letters[1L:5L], fact = factor(letters[1L:5L]), - complex = as.complex(1:5)) + raw = as.raw(1:5)) if (test_bit64) { dt1[, int64 := as.integer64(c(1:9, 3e10))] dt2[, int64 := as.integer64(c(1:4, 3e9))] @@ -14790,8 +14790,8 @@ test(2044.08, nrow(dt1[dt2, on="fact==fact", verbose=TRUE]), nrow(dt if (test_bit64) { test(2044.09, nrow(dt1[dt2, on = "int64==int64", verbose=TRUE]), nrow(dt2), output="No coercion needed") } -test(2044.10, dt1[dt2, on = "int==complex"], error = "i.complex is type complex which is not supported by data.table join") -test(2044.11, dt1[dt2, on = "complex==int"], error = "x.complex is type complex which is not supported by data.table join") +test(2044.10, dt1[dt2, on = "int==raw"], error = "i.raw is type raw which is not supported by data.table join") +test(2044.11, dt1[dt2, on = "raw==int"], error = "x.raw is type raw which is not supported by data.table join") # incompatible types test(2044.20, dt1[dt2, on="bool==int"], error="Incompatible join types: x.bool (logical) and i.int (integer)") test(2044.21, dt1[dt2, on="bool==doubleInt"], error="Incompatible join types: x.bool (logical) and i.doubleInt (double)") @@ -15331,6 +15331,72 @@ test(2068.3, setkey(DT, ID), error="Item 2 of list is type 'raw'") # setreordervec triggers !isNewList branch for coverage test(2068.4, setreordervec(DT$r, order(DT$ID)), error="reorder accepts vectors but this non-VECSXP") +# forderv (and downstream functions) handles complex vector input, part of #3690 +DT = data.table( + a = c(1L, 1L, 8L, 2L, 1L, 9L, 3L, 2L, 6L, 6L), + b = c(3+9i, 10+5i, 8+2i, 10+4i, 3+3i, 1+2i, 5+1i, 8+1i, 8+2i, 10+6i), + c = 6 +) +test(2069.01, DT[order(a, b)], DT[base::order(a, b)]) +test(2069.02, DT[order(a, -b)], DT[base::order(a, -b)]) +test(2069.03, forderv(DT$b, order = 1L), base::order(DT$b)) +test(2069.04, forderv(DT$b, order = -1L), base::order(-DT$b)) +test(2069.05, forderv(DT, by = 2:1), forderv(DT[ , 2:1])) +test(2069.06, forderv(DT, by = 2:1, order = c(1L, -1L)), DT[order(b, -a), which = TRUE]) + +# downstreams of forder +DT = data.table( + z = c(0, 0, 1, 1, 2, 3) + c(1, 1, 2, 2, 3, 4)*1i, + grp = rep(1:2, 3L), + v = c(3, 1, 4, 1, 5, 9) +) +unq_z = 0:3 + (1:4)*1i +test(2069.07, DT[ , .N, by=z], data.table(z=unq_z, N=c(2L, 2L, 1L, 1L))) +test(2069.08, DT[ , .N, keyby = z], data.table(z=unq_z, N=c(2L, 2L, 1L, 1L), key='z')) +test(2069.09, dcast(DT, z ~ grp, value.var='v', fill=0), + data.table(z=unq_z, `1`=c(3, 4, 5, 0), `2`=c(1, 1, 0, 9), key='z')) +test(2069.10, frank(DT$z), c(1.5, 1.5, 3.5, 3.5, 5, 6)) +test(2069.11, frank(DT$z, ties.method='max'), c(2L, 2L, 4L, 4L, 5L, 6L)) +test(2069.12, frank(-DT$z, ties.method='min'), c(5L, 5L, 3L, 3L, 2L, 1L)) +test(2069.13, DT[ , rowid(z, grp)], rep(1L, 6L)) +test(2069.14, DT[ , rowid(z)], c(1:2, 1:2, 1L, 1L)) +test(2069.15, rleid(c(1i, 1i, 1i, 0, 0, 1-1i, 2+3i, 2+3i)), rep(1:4, c(3:1, 2L))) +# handling doubles properly +test(2069.16, rleid(c(1i, 1.1i)), 1:2) +test(2069.17, rleidv(DT, "z"), c(1L, 1L, 2L, 2L, 3L, 4L)) +test(2069.18, unique(DT, by = 'z'), data.table(z = unq_z, grp = c(1L, 1L, 1L, 2L), v = c(3, 4, 5, 9))) +test(2069.19, unique(DT, by = 'z', fromLast = TRUE), data.table(z = unq_z, grp = c(2L, 2L, 1L, 2L), v = c(1, 1, 5, 9))) +test(2069.20, uniqueN(DT$z), 4L) + +# setkey, setorder work +DT = data.table(a = 2:1, z = 0 + (1:0)*1i) +test(2069.21, setkey(copy(DT), z), data.table(a=1:2, z=0+ (0:1)*1i, key='z')) +test(2069.22, setorder(DT, z), data.table(a=1:2, z=0+ (0:1)*1i)) + +## assorted coverage tests from along the way +if (test_bit64) { + test(2069.23, is.sorted(as.integer64(10:1)), FALSE) + test(2069.24, is.sorted(as.integer64(1:10))) +} +# sort by vector outside of table +ord = 3:1 +test(2069.25, forder(data.table(a=3:1), ord), 3:1) +# dogroups.c coverage +test(2069.26, data.table(c='1')[ , expression(1), by=c], error="j evaluates to type 'expression'") +test(2069.27, data.table(c='1', d=2)[ , d := .(NULL), by=c], error='RHS is NULL when grouping :=') +test(2069.28, data.table(c='1', d=2)[ , c(a='b'), by=c, verbose=TRUE], output='j appears to be a named vector') +test(2069.29, data.table(c = '1', d = 2)[ , .(a = c(nm='b')), by = c, verbose = TRUE], output = 'Column 1 of j is a named vector') +DT <- data.table(a = rep(1:3, each = 4), b = LETTERS[1:4], z = 0:3 + (4:1)*1i) +test(2069.30, DT[, .SD[3,], by=b], DT[9:12, .(b, a, z)]) +DT = data.table(x=1:4,y=1:2,lgl=TRUE,key="x,y") +test(2069.31, DT[CJ(1:4,1:4), any(lgl), by=.EACHI]$V1, + c(TRUE, NA, NA, NA, NA, TRUE, NA, NA, TRUE, NA, NA, NA, NA, TRUE, NA, NA)) +set.seed(45L) +DT1 = data.table(a = sample(3L, 15L, TRUE) + .1, b=sample(c(TRUE, FALSE, NA), 15L, TRUE)) +DT2 = data.table(a = sample(3L, 6L, TRUE) + .1, b=sample(c(TRUE, FALSE, NA), 6L, TRUE)) +test(2069.32, DT1[DT2, .(y = sum(b, na.rm=TRUE)), by=.EACHI, on=c(a = 'a', b="b")]$y, rep(0L, 6L)) +DT = data.table(z = 1i) +test(2069.33, DT[DT, on = 'z'], error = "Type 'complex' not supported for joining/merging") ################################### # Add new tests above this line # diff --git a/src/bmerge.c b/src/bmerge.c index eb55569b0..ec003ff54 100644 --- a/src/bmerge.c +++ b/src/bmerge.c @@ -299,7 +299,7 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisg // ilow and iupp now surround the group in ic, too } break; - case STRSXP : + case STRSXP : { if (op[col] != EQ) error("Only '==' operator is supported for columns of type %s.", type2char(TYPEOF(xc))); ival.s = ENC2UTF8(STRING_ELT(ic,ir)); while(xlow < xupp-1) { @@ -338,7 +338,7 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisg xval.s = ENC2UTF8(STRING_ELT(ic, o ? o[mid]-1 : mid)); if (xval.s == ival.s) tmpupp=mid; else ilow=mid; // see above re == } - break; + } break; case REALSXP : { double *dic = REAL(ic); double *dxc = REAL(xc); @@ -406,7 +406,7 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisg } break; default: - error("Type '%s' not supported as key column", type2char(TYPEOF(xc))); + error("Type '%s' not supported for joining/merging", type2char(TYPEOF(xc))); } if (xlow0 diff --git a/src/forder.c b/src/forder.c index 613526098..3dd2031a4 100644 --- a/src/forder.c +++ b/src/forder.c @@ -440,13 +440,15 @@ SEXP forder(SEXP DT, SEXP by, SEXP retGrpArg, SEXP sortGroupsArg, SEXP ascArg, S if (!isInteger(by) || !LENGTH(by)) error("DT has %d columns but 'by' is either not integer or is length 0", length(DT)); // seq_along(x) at R level if (!isInteger(ascArg) || LENGTH(ascArg)!=LENGTH(by)) error("Either 'ascArg' is not integer or its length (%d) is different to 'by's length (%d)", LENGTH(ascArg), LENGTH(by)); nrow = length(VECTOR_ELT(DT,0)); + int n_cplx = 0; for (int i=0; i length(DT)) - error("'by' value %d out of range [1,%d]", INTEGER(by)[i], length(DT)); - if ( nrow != length(VECTOR_ELT(DT, INTEGER(by)[i]-1)) ) + int by_i = INTEGER(by)[i]; + if (by_i < 1 || by_i > length(DT)) + error("'by' value %d out of range [1,%d]", by_i, length(DT)); + if ( nrow != length(VECTOR_ELT(DT, by_i-1)) ) error("Column %d is length %d which differs from length of column 1 (%d)\n", INTEGER(by)[i], length(VECTOR_ELT(DT, INTEGER(by)[i]-1)), nrow); + if (TYPEOF(VECTOR_ELT(DT, by_i-1)) == CPLXSXP) n_cplx++; } - if (!isLogical(retGrpArg) || LENGTH(retGrpArg)!=1 || INTEGER(retGrpArg)[0]==NA_LOGICAL) error("retGrp must be TRUE or FALSE"); retgrp = LOGICAL(retGrpArg)[0]==TRUE; if (!isLogical(sortGroupsArg) || LENGTH(sortGroupsArg)!=1 || INTEGER(sortGroupsArg)[0]==NA_LOGICAL ) error("sortGroups must be TRUE or FALSE"); @@ -476,11 +478,14 @@ SEXP forder(SEXP DT, SEXP by, SEXP retGrpArg, SEXP sortGroupsArg, SEXP ascArg, S savetl_init(); // from now on use Error not error int ncol=length(by); - key = calloc(ncol*8+1, sizeof(uint8_t *)); // needs to be before loop because part II relies on part I, column-by-column. +1 because we check NULL after last one + key = calloc((ncol+n_cplx)*8+1, sizeof(uint8_t *)); // needs to be before loop because part II relies on part I, column-by-column. +1 because we check NULL after last one // TODO: if key==NULL Error nradix=0; // the current byte we're writing this column to; might be squashing into it (spare>0) int spare=0; // the amount of bits remaining on the right of the current nradix byte bool isReal=false; + bool complexRerun = false; // see comments below in CPLXSXP case + SEXP CplxPart = R_NilValue; + if (n_cplx) { CplxPart=PROTECT(allocVector(REALSXP, nrow)); n_protect++; } // one alloc is reused for each part TEND(2); for (int col=0; col