When we’re building a machine learning model, it is very important that we select only those features or predictors which are necessary. Suppose we have 100 features or predictors in our dataset. That doesn’t necessarily mean that we need to have all 100 features in our model. This is because not all 100 features will have significant influence on the model. But then again, this doesn’t mean it will be true for all cases. It depends entirely on the data we have in hand. Here is more info about why we need feature selection.
There are various ways in which you can find out which features have very less impact on the model and which ones you can remove from your dataset. I have written about feature selection before, but that was very brief. In this post, we’ll look at Backward Elimination and how we can do this, step by step. But before we start talking about backward elimination, make sure you make yourself familiar with P-value.
The first step in backward elimination is pretty simple, you just select a significance level, or select the P-value. Usually, in most cases, a 5% significance level is selected. This means, the P-value will be 0.05. You can change this value depending on the project.
The second step is also very simple. You simply fit your machine learning model with all the features selected. So if there are 100 features, you include all of them in your model and fit the model on your test dataset. No changes here.
In step 3, identify the feature or predictor which has the highest P-value. Pretty simple again, right?
This is a significant step. Here, we take decisions. In the previous step, we identified the feature which has the highest P-value. If the P-value of this feature is greater than the significance level we selected in the first step, we remove this feature from our dataset. If the P-value of this feature, which is the highest in the set, is less than the significance level, we’ll just jump to Step 6, which means that we’re done. Remember, highest P-value greater than significance level, remove that feature.
Once we find out the feature which has to be removed from the dataset, we’ll do that in this step. So we remove the feature from the dataset, and we’ll fit the model again with the new dataset. After fitting the model for the new dataset, we’ll jump back to step 3.
This process continues until we reach a point in step 4 where the highest P-value from all the remaining features in the dataset is less than the significance selected in step 1. In our example, this means we iterate from step 3 to step 5 and back till the highest P-value in the dataset is less than 0.05. This could take a while. Out of the 100 assumed features, we might filter out a good 10 features this way (which is just a random number I selected). Refer the flowchart at the top of this post to get a better idea of these steps.
Once we reach step 6, we’re done with the feature select process. We have successfully used backward elimination to filter out features which were not significant enough for our model. There are a few other methods which we can use for this process. And I guess I’ll write about them as well in the future.