Data Science

What is multicollinearity?

Image from StaticsticsHowTo Multicollinearity is a term we often come across when we're working with multiple regression models. Even we have talked about it in our previous posts, but do we know what it actually means? Today, we'll try to understand that. In most real life problems, we usually have multiple features to work with. And not all of them are in the format that we, or the model, wants. For example, a lot of categorical features are usually in the text format. But as we already know, our models require the features to be numerical. For this, we will label encode the feature and if required, we'll even one hot encode them. But in some cases, we might have features whose values can be easily determined by the values of other features. In other words, we can see a very go...

Read More
Data Science

Overfitting and Underfitting models in Machine Learning

In most of our posts about machine learning, we've talked about overfitting and underfitting. But most of us don't yet know what those two terms mean. What does it acutally mean when a model is overfit, or underfit? Why are they considered not good? And how do they affect the accuracy of our model's predictions? These are some of the basic, but important questions we need to ask and get answers to. So let's discuss these two today. The datasets we use for training and testing our models play a huge role in the efficiency of our models. Its equally important to understand the data we're working with. The quantity and the quality of the data also matter, obviously. When the data is too less in the training phase, the models may fail to understand the patterns in the data, or fa...

Read More
Data Science

Different types of Validations in Machine Learning (Cross Validation)

Now that we know what is feature selection and how to do it, let's move our focus to validating the efficiency of our model. This is known as validation or cross validation, depending on what kind of validation method you're using. But before that, let's try to understand why we need to validate our models. Validation, or Evaluation of Residuals Once you are done with fitting your model to you training data, and you've also tested it with your test data, you can't just assume that its going to work well on data that it has not seen before. In other words, you can't be sure that the model will have the desired accuracy and variance in your production environment. You need some kind of assurance of the accuracy of the predictions that your model is putting out. For this, we need to val...

Read More