Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep.rownames argument for transpose #3715

Merged
merged 5 commits into from
Aug 8, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,29 @@

20. `setkey`, `[key]by=` and `on=` in verbose mode (`options(datatable.verbose=TRUE)`) now detect any columns inheriting from `Date` which are stored as 8 byte double, test if any fractions are present, and if not suggest using a 4 byte integer instead (such as `data.table::IDate`) to save space and time, [#1738](https://github.com/Rdatatable/data.table/issues/1738). In future this could be upgraded to `message` or `warning` depending on feedback.

21. `transpose` gains `keep.rownames` argument which stores the input's names at the beginning of the output (the first list element or the first column), [#1886](https://github.com/Rdatatable/data.table/issues/1886). Thanks to @ghost for the request.
21. New function `fifelse(test, yes, no)` has been implemented in C by Morgan Jacob, [#3657](https://github.com/Rdatatable/data.table/issues/3657). It is comparable to `base::ifelse`, `dplyr::if_else`, `hutils::if_else`, and (forthcoming) [`vctrs::if_else()`](https://vctrs.r-lib.org/articles/stability.html#ifelse). It returns a vector of the same length as `test` but unlike `base::ifelse` the output type is consistent with those of `yes` and `no`. Please see `?data.table::fifelse` for more details.

```R
# default 4 threads on a laptop with 16GB RAM and 8 logical CPU
x = sample(c(TRUE,FALSE), 3e8, replace=TRUE) # 1GB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you went 5e8 to 3e8 for ifelse benchmark?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it 1GB. I don't remember changing it from 5e8 to 3e8 though; can't remember -- didn't I change something else at the same time too? Main change was getting away from 100 runs on small size. It can be 5e8 too.

microbenchmark::microbenchmark(
base::ifelse(x, 7L, 11L),
dplyr::if_else(x, 7L, 11L),
hutils::if_else(x, 7L, 11L),
data.table::fifelse(x, 7L, 11L),
times = 5L, unit="s"
)
# Unit: seconds
# expr min med max neval
# base::ifelse(x, 7L, 11L) 8.5 8.6 8.8 5
# dplyr::if_else(x, 7L, 11L) 9.4 9.5 9.7 5
# hutils::if_else(x, 7L, 11L) 2.6 2.6 2.7 5
# data.table::fifelse(x, 7L, 11L) 1.5 1.5 1.6 5 # setDTthreads(1)
# data.table::fifelse(x, 7L, 11L) 0.8 0.8 0.9 5 # setDTthreads(2)
# data.table::fifelse(x, 7L, 11L) 0.4 0.4 0.5 5 # setDTthreads(4)
mattdowle marked this conversation as resolved.
Show resolved Hide resolved
```

22. `transpose` gains `keep.rownames` argument which stores the input's names at the beginning of the output (the first list element or the first column), [#1886](https://github.com/Rdatatable/data.table/issues/1886). Thanks to @ghost for the request.

#### BUG FIXES

Expand Down
102 changes: 99 additions & 3 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -15413,10 +15413,106 @@ DT = data.table(d=sample(seq(as.Date("2015-01-01"), as.Date("2015-01-05"), by="d
test(2070.01, typeof(DT$d), "double")
test(2070.02, DT[, .N, keyby=d, verbose=TRUE], output="Column 1.*date.*8 byte double.*no fractions are present.*4 byte integer.*to save space and time")

library(data.table)
# coverage along with switch+default pairing
test(2071.01, dcast(data.table(id=1, grp=1, e=expression(1)), id ~ grp, value.var='e'), error="Unsupported column type in fcast val: 'expression'")
test(2071.02, is_na(data.table(expression(1))), error="Unsupported column type 'expression'")
test(2071.03, is_na(data.table(1L), 2L), error="Item 1 of 'cols' is 2 which is outside")
test(2071.04, is_na(list(1L, 1:2)), error="Column 2 of input list x is length 2, inconsistent")
test(2071.05, any_na(data.table(1L), 2L), error="Item 1 of 'cols' is 2 which is outside")
test(2071.06, any_na(list(1L, 1:2)), error="Column 2 of input list x is length 2, inconsistent")
test(2071.07, any_na(data.table(as.raw(0L))), FALSE)
test(2071.08, any_na(data.table(c(1+1i, NA))))
test(2071.09, any_na(data.table(expression(1))), error="Unsupported column type 'expression'")
test(2071.10, dcast(data.table(a=1, b=1, l=list(list(1))), a ~ b, value.var='l'),
data.table(a=1, `1`=list(list(1)), key='a'))
test(2071.11, dcast(data.table(a = 1, b = 2, c = 3), a ~ b, value.var = 'c', fill = '2'),
data.table(a=1, `2`=3, key='a'))

# fifelse, #3657
test_vec = -5L:5L < 0L
test_vec_na = c(test_vec, NA)
out_vec = rep(1:0, 5:6)
out_vec_na = c(out_vec, NA_integer_)
test(2072.001, fifelse(test_vec, 1L, 0L), out_vec)
test(2072.002, fifelse(test_vec, 1, 0), as.numeric(out_vec))
test(2072.003, fifelse(test_vec, TRUE, FALSE), as.logical(out_vec))
test(2072.004, fifelse(test_vec, "1", "0"), as.character(out_vec))
test(2072.005, fifelse(test_vec_na, TRUE, NA), c(rep(TRUE,5L), rep(NA,7L)))
test(2072.006, fifelse(test_vec, rep(1L,11L), rep(0L,11L)), out_vec)
test(2072.007, fifelse(test_vec, rep(1L,11L), 0L), out_vec)
test(2072.008, fifelse(test_vec, 1L, rep(0L,11L)), out_vec)
test(2072.009, fifelse(test_vec, rep(1L,11L), rep(0L,10L)), error="Length of 'no' is 10 but must be 1 or length of 'test' (11).")
test(2072.010, fifelse(test_vec, rep(1,10L), rep(0,11L)), error="Length of 'yes' is 10 but must be 1 or length of 'test' (11).")
test(2072.011, fifelse(test_vec, rep(TRUE,10L), rep(FALSE,10L)), error="Length of 'yes' is 10 but must be 1 or length of 'test' (11).")
test(2072.012, fifelse(0:1, rep(TRUE,2L), rep(FALSE,2L)), error="Argument 'test' must be logical.")
test(2072.013, fifelse(test_vec, TRUE, "FALSE"), error="'yes' is of type logical but 'no' is of type character. Please")
test(2072.014, fifelse(test_vec, list(1),list(2,4)), error="Length of 'no' is 2 but must be 1 or length of 'test' (11).")
test(2072.015, fifelse(test_vec, list(1,3),list(2,4)), error="Length of 'yes' is 2 but must be 1 or length of 'test' (11).")
test(2072.016, fifelse(test_vec, list(1), 0), as.list(as.numeric(out_vec)))
test(2072.017, fifelse(test_vec, 1, list(0)), as.list(as.numeric(out_vec)))
## Jan 1 - 5, 2011
date_vec = as.Date(14975:14979, origin = '1970-01-01')
test(2072.018, fifelse(date_vec == "2011-01-01", date_vec - 1L, date_vec),
c(date_vec[1L] - 1L, date_vec[2:5]))
test(2072.019, fifelse(c(TRUE,FALSE,TRUE,TRUE,FALSE), factor(letters[1:5]), factor("a", levels=letters[1:5])),
factor(c("a","a","c","d","a"), levels=letters[1:5]))
test(2072.020, fifelse(test_vec_na, 1L, 0L), out_vec_na)
test(2072.021, fifelse(test_vec_na, rep(1L,12L), 0L), out_vec_na)
test(2072.022, fifelse(test_vec_na, rep(1L,12L), rep(0L,12L)), out_vec_na)
test(2072.023, fifelse(test_vec_na, 1L, rep(0L,12L)), out_vec_na)
test(2072.024, fifelse(test_vec_na, 1, 0), as.numeric(out_vec_na))
test(2072.025, fifelse(test_vec_na, rep(1,12L), 0), as.numeric(out_vec_na))
test(2072.026, fifelse(test_vec_na, rep(1,12L), rep(0,12L)), as.numeric(out_vec_na))
test(2072.027, fifelse(test_vec_na, 1, rep(0,12L)), as.numeric(out_vec_na))
test(2072.028, fifelse(test_vec_na, TRUE, rep(FALSE,12L)), as.logical(out_vec_na))
test(2072.029, fifelse(test_vec_na, rep(TRUE,12L), FALSE), as.logical(out_vec_na))
test(2072.030, fifelse(test_vec_na, rep(TRUE,12L), rep(FALSE,12L)), as.logical(out_vec_na))
test(2072.031, fifelse(test_vec_na, "1", rep("0",12L)), as.character(out_vec_na))
test(2072.032, fifelse(test_vec_na, rep("1",12L), "0"), as.character(out_vec_na))
test(2072.033, fifelse(test_vec_na, rep("1",12L), rep("0",12L)), as.character(out_vec_na))
test(2072.034, fifelse(test_vec_na, "1", "0"), as.character(out_vec_na))
test(2072.035, fifelse(test_vec, as.Date("2011-01-01"), FALSE), error="'yes' is of type double but 'no' is of type logical. Please")
test(2072.036, fifelse(test_vec_na, 1+0i, 0+0i), as.complex(out_vec_na))
test(2072.037, fifelse(test_vec_na, rep(1+0i,12L), 0+0i), as.complex(out_vec_na))
test(2072.038, fifelse(test_vec_na, rep(1+0i,12L), rep(0+0i,12L)), as.complex(out_vec_na))
test(2072.039, fifelse(test_vec_na, 1+0i, rep(0+0i,12L)), as.complex(out_vec_na))
test(2072.040, fifelse(test_vec, as.raw(0), as.raw(1)), error="Type raw is not supported.")
test(2072.041, fifelse(TRUE,1,as.Date("2019-07-07")), error="'yes' has different class than 'no'. Please")
test(2072.042, fifelse(TRUE,1L,factor(letters[1])), error="'yes' has different class than 'no'. Please")
test(2072.043, fifelse(TRUE, list(1:5), list(5:1)), list(1:5))
test(2072.044, fifelse(as.logical(NA), list(1:5), list(5:1)), list(NULL))
test(2072.045, fifelse(FALSE, list(1:5), list(5:1)), list(5:1))
test(2072.046, fifelse(TRUE, data.table(1:5), data.table(5:1)), data.table(1:5))
test(2072.047, fifelse(FALSE, data.table(1:5), data.table(5:1)), data.table(5:1))
test(2072.048, fifelse(TRUE, data.frame(1:5), data.frame(5:1)), data.frame(1:5))
test(2072.049, fifelse(FALSE, data.frame(1:5), data.frame(5:1)), data.frame(5:1))
test(2072.050, fifelse(c(TRUE,FALSE), list(1:5,6:10), list(10:6,5:1)), list(1:5,5:1))
test(2072.051, fifelse(c(NA,TRUE), list(1:5,6:10), list(10:6,5:1)), list(NULL,6:10))
test(2072.052, fifelse(c(FALSE,TRUE), list(1:5,6:10), list(10:6,5:1)), list(10:6,6:10))
test(2072.053, fifelse(c(NA,TRUE), list(1:5), list(10:6,5:1)), list(NULL,1:5))
test(2072.054, fifelse(c(NA,TRUE), list(1:5,6:10), list(5:1)), list(NULL,6:10))
test(2072.055, fifelse(c(FALSE,TRUE), list(TRUE), list(10:6,5:1)), list(10:6,TRUE))
test(2072.056, fifelse(c(FALSE,TRUE), list(as.Date("2019-07-07")), list(10:6,5:1)), list(10:6,as.Date("2019-07-07")))
test(2072.057, fifelse(c(FALSE,TRUE), list(factor(letters[1:5])), list(10:6,5:1)), list(10:6,factor(letters[1:5])))
test(2072.058, fifelse(c(NA,FALSE), list(1:5), list(10:6,5:1)), list(NULL,5:1))
test(2072.059, fifelse(c(NA,FALSE), list(1:5,6:10), list(5:1)), list(NULL,5:1))
test(2072.060, fifelse(c(NA,FALSE), list(1:5), list(5:1)), list(NULL,5:1))
test(2072.061, fifelse(c(TRUE,FALSE), list(1L), 0L), list(1L,0L))
test(2072.062, fifelse(c(TRUE,FALSE), 1L, list(0L)), list(1L,0L))
test(2072.063, fifelse(c(TRUE,FALSE), factor(c("a","b")), factor(c("a","c"))), error="'yes' and 'no' are both type factor but their levels are different")
test(2072.064, fifelse(c(TRUE, TRUE, TRUE, FALSE, FALSE), factor(NA, levels=letters[1:5]), factor(letters[1:5])),
factor(c(NA,NA,NA,"d","e"),levels=letters[1:5]))
test(2072.065, fifelse(c(TRUE, TRUE, TRUE, FALSE, NA, FALSE), factor(NA, levels=letters[1:6]), factor(letters[1:6])),
factor(c(NA,NA,NA,"d",NA,"f"),levels=letters[1:6]))
test(2072.066, fifelse(c(TRUE, TRUE, TRUE, FALSE, NA, FALSE), factor(letters[1:6]), factor(NA, levels=letters[1:6])),
factor(c("a","b","c",NA,NA,NA), levels=letters[1:6]))
test(2072.067, fifelse(c(TRUE, NA, TRUE, FALSE, FALSE, FALSE), factor(NA), factor(NA)),
factor(c(NA,NA,NA,NA,NA,NA)))

DT = data.table(x=1:5, y=6:10)
test(2071.1, transpose(DT, keep.rownames = TRUE),
data.table(c('x','y'), c(1L, 6L), c(2L, 7L), c(3L, 8L), c(4L, 9L), c(5L, 10L)))
test(2073.1, transpose(DT, keep.rownames = TRUE),
data.table(c('x','y'), c(1L, 6L), c(2L, 7L), c(3L, 8L), c(4L, 9L), c(5L, 10L)))


###################################
# Add new tests above this line #
Expand Down
You are viewing a condensed version of this merge commit. You can view the full changes here.