Convert early_zero_weight() to data.table #144

jdblischak · 2023-11-21T17:04:07Z

Please merge PR #143 first. I will likely need to resolve some merge conflicts locally before we can merge this one

I also added some more tests

The speed increase for the unstratified case was limited (~1200 -> 800 microseconds). However, the speed increase for the stratified case was impressive (~123 -> ~5 milliseconds)

Some lessons learned:

For a single column, length(unique(x$col)) is noticeably faster than uniqueDT(x[, .(col)]), as least for small tables (nrow = 125 in this case). I assume that uniqueDT() pays off when the number of rows increases, and also when multiple columns need to be selected, but I didn't investigate thoroughly
merge.data.table(x, y, by = "col", all.x = TRUE) matches the sort order of dplyr::left_join(x, y), whereas x[y, on = "col"] returns a different row order

tests/testthat/test-data.table.R

nanxstats

Thanks for rewriting this, looking good to me.

Also, thanks for removing the magic numbers in tests - i think this will pay off in the long run.

jdblischak requested review from nanxstats and LittleBeannie November 21, 2023 17:04

jdblischak self-assigned this Nov 21, 2023

jdblischak force-pushed the dt-early_zero_weight branch from 7f3ae19 to 5e681f5 Compare November 21, 2023 17:05

nanxstats reviewed Nov 21, 2023

View reviewed changes

tests/testthat/test-data.table.R Outdated Show resolved Hide resolved

jdblischak force-pushed the dt-early_zero_weight branch from 5e681f5 to 0b90ba2 Compare November 21, 2023 17:14

Convert early_zero_weight() to data.table

76f3090

jdblischak force-pushed the dt-early_zero_weight branch from 0b90ba2 to 76f3090 Compare November 21, 2023 17:24

jdblischak requested a review from nanxstats November 21, 2023 17:25

nanxstats approved these changes Nov 22, 2023

View reviewed changes

nanxstats merged commit c60c927 into Merck:main Nov 22, 2023

jdblischak deleted the dt-early_zero_weight branch November 22, 2023 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert early_zero_weight() to data.table #144

Convert early_zero_weight() to data.table #144

jdblischak commented Nov 21, 2023

nanxstats left a comment

Convert early_zero_weight() to data.table #144

Convert early_zero_weight() to data.table #144

Conversation

jdblischak commented Nov 21, 2023

nanxstats left a comment

Choose a reason for hiding this comment