-
-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order of feature in formula seems to matter (last place, particularly) #733
Comments
Problem doesn't occur when |
Not sure if this is exactly the issue that is being hit above, but it appears that there is a bug in
Which prints the following:
In other words, the wrong indexes are being skipped (should have skipped |
I think the issue here is that the |
Hm, this actually appears to be a bug in |
It seems the problem is actually
and the Worst: Nothing of this is documented. :( |
Now fixed in #738 (and some basic docs that drawWithoutReplacement* expects sorted skip vectors). |
Seeing some odd behavior where a feature appears to not be considered depending on where it is in the formula definition. If it's last, it ends up with a variable importance of 0, but if it's first, it has a high variable importance. This is a feature that we know will always have a high variable importance. Increasing
mtry
does seem to bring the feature into consideration, which would suggest it has something to do with how it's randomly choosing features, but really, order in the formula definition shouldn't matter, right?I've provided all the parts for a reproducible example here, with RDS of input data attached.
predicted_er
is the feature of interest here. Note that when it goes first, then the new "last" feature now has an importance of 0, after having a high importance when it wasn't last.example_data.rds.zip
The text was updated successfully, but these errors were encountered: