Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model returned by inform stage only uses fraction of training data in final model #28

Open
sschmidt23 opened this issue May 22, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@sschmidt23
Copy link
Contributor

Irene asked a question about the splitting of data that is done in Inform_FZBoost to split data to create a separate validation sample used in determining the best bump_thresh and sharpen params. Right now I have things set to train on a fraction trainfrac of the data, and use (1-trainfrac) fraction to compute cde loss values for a grid of bump_thresh and sharpen values. The model returned is the one trained on only trainfrac fraction of the training data. Re-computing the model with the full dataset would result in a better model, but would almost double the runtime of the inform stage. Maybe a compromise would be adding a config option named something like rerun_full that gives the user the option of recomputing the model on the full dataset.

@sschmidt23 sschmidt23 self-assigned this May 22, 2023
@sschmidt23 sschmidt23 added the enhancement New feature or request label May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant