There are two types of tricks in data science and ML: tricks that are rare and very cool. They are designed to grab your attention but ultimately, you will never use them because their use-cases are too narrow. Think of those Python one-liners that are dreadful in terms of readability.
In the second category, there are tricks that are rare, cool and so useful that you will start using them immediately in your work.
From my three-year journey into data, I’ve collected more than 100 tricks and resources that fall under the second category (there might be some small overlap with the first category sometimes) and curated them into an online book — Tricking Data Science.
While there are more than 200 items in the online book and organized neatly, I put the best 130 into one article as Medium offers much better reading experience.
In case you want to jump over to the book without reading the full article — I mean, for freaking 50 minutes, who would?— I would ask to leave those 50 claps and to follow me before doing so 🙂
1. Permutation Importance with ELI5
Permutation importance is one of the most reliable ways to see the important features in a model.
- Works on any model structure
- Easy to interpret and implement
- Consistent and reliable
Permutation importance of a feature is defined as the change in model performance when that feature is randomly shuffled.
PI is available through the eli5 package. Below are PI scores for an XGBoost Regressor model👇
The show_weights function displays the features that hurt the model’s performance the most after being shuffled — i.e. the most important features.