Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Selection Accuracy Comparison #47

Open
ericborgos opened this issue Apr 16, 2024 · 1 comment
Open

Feature Selection Accuracy Comparison #47

ericborgos opened this issue Apr 16, 2024 · 1 comment
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@ericborgos
Copy link

It would be helpful if on the page where you give the Titanic examples, if you also displayed a comparison of the accuracy scores, to show which feature selection methods were the most effective.

@ThomasBury ThomasBury self-assigned this Apr 21, 2024
@ThomasBury ThomasBury added the documentation Improvements or additions to documentation label Apr 21, 2024
@msat59
Copy link

msat59 commented Jan 22, 2025

@ericborgos , no single method is the best in all cases; it depends on the specific problem. Additionally, the dataset size plays a significant role; you can't judge by just using Titanic dataset. I tested various methods on a small synthetic dataset (using sklearn.make_regression), and GrootCV successfully identified all informative features. However, it performed poorly on a real-world problem with over 5000 features. In that case, Leshy, using both native and SHAP importance, produced much better results than other methods. I later did hyper-param tuning, calculated SHAP values and kept top 50% features. I did it in several iteration until I had around 50 features (desired by the business).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants