Tag Archives: python scikit

Overfitting and Underfitting models in Machine Learning

By | August 2, 2018


In most of our posts about machine learning, we’ve talked about overfitting and underfitting. But most of us don’t yet know what those two terms mean. What does it acutally mean when a model is overfit, or underfit? Why are they considered not good? Read more...

Different types of Validations in Machine Learning (Cross Validation)

By | August 1, 2018


Now that we know what is feature selection and how to do it, let’s move our focus to validating the efficiency of our model. This is known as validation or cross validation, depending on what kind of validation method you’re using. Read more...

Different methods of feature selection

By | July 31, 2018


In our previous post, we discussed what is feature selection and why we need feature selection. In this post, we’re going to look at the different methods used in feature selection. There are three main classification of feature selection methods – Filter Methods, Wrapper Methods, and Embedded Methods. Read more...

Linear Regression in Python using SciKit Learn

By | July 30, 2018

Today we’ll be looking at a simple Linear Regression example in Python, and as always, we’ll be using the SciKit Learn library. If you haven’t yet looked into my posts about data pre-processing, which is required before you can fit a model, checkout how you can encode your data to make sure it doesn’t contain any text, and then how you can handle missing data in your datasetRead more...

Why do we need feature scaling in Machine Learning and how to do it using SciKit Learn?

By | July 27, 2018

When you’re working with a learning model, it is important to scale the features to a range which is centered around zero. This is done so that the variance of the features are in the same range. If a feature’s variance is orders of magnitude more than the variance of other features, that particular feature might dominate other features in the dataset, which is not something we want happening in our model. Read more...

How to split your dataset to train and test datasets using SciKit Learn

By | July 27, 2018

When you’re working on a model and want to train it, you obviously have a dataset. But after training, we have to test the model on some test dataset. For this, you’ll a dataset which is different from the training set you used earlier. Read more...

Handle missing data in your training dataset with SciKit Imputer

By | July 27, 2018


Most often than not, you’ll encounter a dataset  in your data science projects where you’ll have missing data in at least one column. In some cases, you can just ignore that row by taking it out of the dataset. But that’ll not be the case always. Read more...

Label Encoder vs. One Hot Encoder in Machine Learning

By | July 27, 2018


If you’re new to Machine Learning, you might get confused between these two – Label Encoder and One Hot Encoder. These two encoders are parts of the SciKit Learn library in Python, and they are used to convert categorical data, or text data, into numbers, which our predictive models can better understand.  Read more...