-
-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Poisson splitting rule #495
Conversation
@mnwright It would be really great if you could have a look. Feedback is very warmly welcome. |
tests/testthat/test_quantreg.R
Outdated
@@ -2,7 +2,7 @@ library(ranger) | |||
context("ranger_quantreg") | |||
|
|||
rf.quant <- ranger(mpg ~ ., mtcars[1:26, ], quantreg = TRUE, | |||
keep.inbag = TRUE, num.trees = 50) | |||
keep.inbag = TRUE, num.trees = 100, seed = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for the change? Or just a mistake commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this change, I got
Error in ranger(mpg ~ ., mtcars[1:26, ], quantreg = TRUE, keep.inbag = TRUE, :
Error: Too few trees for out-of-bag quantile regression.
It also depends on the seed. This change solved that on my machine. I should have done a separate commit.
Looks great, thanks! What I have to do before merge (notes to myself):
|
Thank you for the fast review! |
I refactored the Poisson splitting rule a bit to follow I assume, a similar speedup could be possible for the beta splitting rule? |
Yes, that should be possible. |
@mnwright I merged master and still think this would be nice to have. |
Sorry for the long silence. I still think this is useful. Let's try to merge it for the next release. |
Looks good, I think we are ready to merge? |
Yes, why not just merge? |
Merged 🎉 |
What does this PR do?
This PR implements a
splitrule = "poisson"
with the additional optionpoisson.tau
to deal with pure nodes that havey = 0
.References
Solves #433.
Further info
The Poisson splitrule is based on the Poisson deviance, but after some arithmetics that makes the split rule computation faster.
The option
poisson.tau
takes action if a terminal node hasy=0
. It then estimates the value of that node asalpha * 0 + (1-alpha) * mean(parent node)
andalpha = samples(node)*mean(parent node) / (poisson.tau + samples(node)*mean(parent node))
. The larger the value ofpoisson.tau
the closer the prediction to the parent node's mean. Rpart does it similar.An alternative would have been (or for the future?) to give an option like "minimum sum of responses per node".