Convert early_zero_weight() to data.table #144
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please merge PR #143 first. I will likely need to resolve some merge conflicts locally before we can merge this one
I also added some more tests
The speed increase for the unstratified case was limited (~1200 -> 800 microseconds). However, the speed increase for the stratified case was impressive (~123 -> ~5 milliseconds)
Some lessons learned:
length(unique(x$col))
is noticeably faster thanuniqueDT(x[, .(col)])
, as least for small tables (nrow = 125 in this case). I assume thatuniqueDT()
pays off when the number of rows increases, and also when multiple columns need to be selected, but I didn't investigate thoroughlymerge.data.table(x, y, by = "col", all.x = TRUE)
matches the sort order ofdplyr::left_join(x, y)
, whereasx[y, on = "col"]
returns a different row order