Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmarking thresholding for when model starts benefitting for using sparse data over dense data. #33

Closed
EmilHvitfeldt opened this issue Nov 15, 2024 · 1 comment

Comments

@EmilHvitfeldt
Copy link
Member

Basically, at what sparsity level does having sparse data improve things.

e.i. should a dataset with 10% be converted to a dcgmatrix or stay a matrix?

there are kinda two scenarios. you have sparse tibbles, or dense tibble with a lot of zeroes

@EmilHvitfeldt
Copy link
Member Author

This was done in https://github.com/tidymodels/benchmark-sparsity-threshold.

We will use the results of the analysis in tidymodels/workflows#271

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant