diff --git a/NEWS.md b/NEWS.md index 34b965b58..5e0091481 100644 --- a/NEWS.md +++ b/NEWS.md @@ -14,7 +14,9 @@ # 2: ``` -2. `cedta()` now returns `FALSE` if `.datatable.aware = FALSE` is set in the calling environment, [#5654](https://github.com/Rdatatable/data.table/issues/5654). +2. `cedta()` now returns `FALSE` if `.datatable.aware = FALSE` is set in the calling environment, [#5654](https://github.com/Rdatatable/data.table/issues/5654). Thanks @dvg-p4 for the request and PR. + +3. `split.data.table` also accepts a formula for `f`, [#5392](https://github.com/Rdatatable/data.table/issues/5392), mirroring the same in `base::split.data.frame` since R 4.1.0 (May 2021). Thanks to @XiangyunHuang for the request, and @ben-schwen for the PR. 3. Namespace-qualifying `data.table::shift()`, `data.table::first()`, or `data.table::last()` will not deactivate GForce, [#5942](https://github.com/Rdatatable/data.table/issues/5942). Thanks @MichaelChirico for the proposal and fix. Namespace-qualifying other calls like `stats::sum()`, `base::prod()`, etc., continue to work as an escape valve to avoid GForce, e.g. to ensure S3 method dispatch. diff --git a/R/data.table.R b/R/data.table.R index 41edee985..c80e89f88 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -2401,6 +2401,9 @@ split.data.table = function(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TR if (!missing(by)) stopf("passing 'f' argument together with 'by' is not allowed, use 'by' when split by column in data.table and 'f' when split by external factor") # same as split.data.frame - handling all exceptions, factor orders etc, in a single stream of processing was a nightmare in factor and drop consistency + # evaluate formula mirroring split.data.frame #5392. Mimics base::.formula2varlist. + if (inherits(f, "formula")) + f <- eval(attr(terms(f), "variables"), x, environment(f)) # be sure to use x[ind, , drop = FALSE], not x[ind], in case downstream methods don't follow the same subsetting semantics (#5365) return(lapply(split(x = seq_len(nrow(x)), f = f, drop = drop, ...), function(ind) x[ind, , drop = FALSE])) } diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 7461afebf..2c2771c6d 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -18294,3 +18294,13 @@ test(2246.1, DT[, data.table::shift(b), by=a], DT[, shift(b), by=a], output="GFo test(2246.2, DT[, data.table::first(b), by=a], DT[, first(b), by=a], output="GForce TRUE") test(2246.3, DT[, data.table::last(b), by=a], DT[, last(b), by=a], output="GForce TRUE") options(old) + +# 5392 split(x,f) works with formula f +dt = data.table(x=1:4, y=factor(letters[1:2])) +test(2247.1, split(dt, ~y), split(dt, dt$y)) +dt = data.table(x=1:4, y=1:2) +test(2247.2, split(dt, ~y), list(`1`=data.table(x=c(1L,3L), y=1L), `2`=data.table(x=c(2L, 4L), y=2L))) +# Closely match the original MRE from the issue +test(2247.3, do.call(rbind, split(dt, ~y)), setDT(do.call(rbind, split(as.data.frame(dt), ~y)))) +dt = data.table(x=1:4, y=factor(letters[1:2]), z=factor(c(1,1,2,2), labels=c("c", "d"))) +test(2247.4, split(dt, ~y+z), list("a.c"=dt[1], "b.c"=dt[2], "a.d"=dt[3], "b.d"=dt[4])) diff --git a/man/split.Rd b/man/split.Rd index f83e5cdfd..687771f0c 100644 --- a/man/split.Rd +++ b/man/split.Rd @@ -12,7 +12,7 @@ } \arguments{ \item{x}{data.table } - \item{f}{factor or list of factors. Same as \code{\link[base:split]{split.data.frame}}. Use \code{by} argument instead, this is just for consistency with data.frame method.} + \item{f}{Same as \code{\link[base:split]{split.data.frame}}. Use \code{by} argument instead, this is just for consistency with data.frame method.} \item{drop}{logical. Default \code{FALSE} will not drop empty list elements caused by factor levels not referred by that factors. Works also with new arguments of split data.table method.} \item{by}{character vector. Column names on which split should be made. For \code{length(by) > 1L} and \code{flatten} FALSE it will result nested lists with data.tables on leafs.} \item{sorted}{When default \code{FALSE} it will retain the order of groups we are splitting on. When \code{TRUE} then sorted list(s) are returned. Does not have effect for \code{f} argument.}