Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert early_zero_weight() to data.table #144

Merged
merged 1 commit into from
Nov 22, 2023

Conversation

jdblischak
Copy link
Collaborator

Please merge PR #143 first. I will likely need to resolve some merge conflicts locally before we can merge this one

I also added some more tests

The speed increase for the unstratified case was limited (~1200 -> 800 microseconds). However, the speed increase for the stratified case was impressive (~123 -> ~5 milliseconds)

Some lessons learned:

  • For a single column, length(unique(x$col)) is noticeably faster than uniqueDT(x[, .(col)]), as least for small tables (nrow = 125 in this case). I assume that uniqueDT() pays off when the number of rows increases, and also when multiple columns need to be selected, but I didn't investigate thoroughly
  • merge.data.table(x, y, by = "col", all.x = TRUE) matches the sort order of dplyr::left_join(x, y), whereas x[y, on = "col"] returns a different row order

Copy link
Collaborator

@nanxstats nanxstats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for rewriting this, looking good to me.

Also, thanks for removing the magic numbers in tests - i think this will pay off in the long run.

@nanxstats nanxstats merged commit c60c927 into Merck:main Nov 22, 2023
@jdblischak jdblischak deleted the dt-early_zero_weight branch November 22, 2023 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants