Feature Selection Accuracy Comparison #47

ericborgos · 2024-04-16T18:26:36Z

It would be helpful if on the page where you give the Titanic examples, if you also displayed a comparison of the accuracy scores, to show which feature selection methods were the most effective.

msat59 · 2025-01-22T05:27:48Z

@ericborgos , no single method is the best in all cases; it depends on the specific problem. Additionally, the dataset size plays a significant role; you can't judge by just using Titanic dataset. I tested various methods on a small synthetic dataset (using sklearn.make_regression), and GrootCV successfully identified all informative features. However, it performed poorly on a real-world problem with over 5000 features. In that case, Leshy, using both native and SHAP importance, produced much better results than other methods. I later did hyper-param tuning, calculated SHAP values and kept top 50% features. I did it in several iteration until I had around 50 features (desired by the business).

ThomasBury self-assigned this Apr 21, 2024

ThomasBury added the documentation Improvements or additions to documentation label Apr 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Selection Accuracy Comparison #47

Feature Selection Accuracy Comparison #47

ericborgos commented Apr 16, 2024

msat59 commented Jan 22, 2025

Feature Selection Accuracy Comparison #47

Feature Selection Accuracy Comparison #47

Comments

ericborgos commented Apr 16, 2024

msat59 commented Jan 22, 2025