Forward Selection for Feature Selection in Machine Learning

Data Science

by Sunny Srinidhi - November 13, 20192

In our previous post, we saw how to perform Backward Elimination as a feature selection algorithm to weed out insignificant features from our dataset. In this post, we'll checkout the next method for feature selection, which is Forward Selection. As you can already guess, this is going to be the opposite of backward elimination, well kind of. But before that, make sure you make yourself familiar with the concept of P-value. Similar to backward elimination, even here we have a few steps to follow. We'll go one by one as usual. But before going in, you need to know that this is going to be a bit more tedious of a job than backward elimination, because you have to create a

Backward Elimination for Feature Selection in Machine Learning

Data Science

by Sunny Srinidhi - November 11, 2019November 11, 20191

When we're building a machine learning model, it is very important that we select only those features or predictors which are necessary. Suppose we have 100 features or predictors in our dataset. That doesn't necessarily mean that we need to have all 100 features in our model. This is because not all 100 features will have significant influence on the model. But then again, this doesn't mean it will be true for all cases. It depends entirely on the data we have in hand. Here is more info about why we need feature selection. There are various ways in which you can find out which features have very less impact on the model and which ones you can remove from your

Different methods of feature selection

Data Science

by Sunny Srinidhi - July 31, 2018November 6, 20191

In our previous post, we discussed what is feature selection and why we need feature selection. In this post, we're going to look at the different methods used in feature selection. There are three main classification of feature selection methods - Filter Methods, Wrapper Methods, and Embedded Methods. We'll look at all of them individually. Filter Methods Filter methods are learning-algorithm-agnostic, which means they can be employed no matter which learning algorithm you're using. They're generally used as data pre-processors. In filter methods, each individual feature in the dataset will be scored on its correlation with the dependent variable. A variety of statistical tests will be used to calculate this correlation score. Based on this score, it will be decided whether to

What is Feature Selection and why do we need it in Machine Learning?

Data Science

by Sunny Srinidhi - July 31, 2018November 11, 20192

If you've come across a dataset in your machine learning endeavors which has more than one feature, you'd have also heard of a concept called Feature Selection. Today, we're going to find out what it is and why we need it. When a dataset has too many features, it would not be ideal to include all of them in our machine learning model. Some features may be irrelevant for the independent variable. For example, if you are going to predict how much it would cost to crush a car, and the features you're given are: the dimensions of the car if the car will be delivered to the crusher or the company has to go pick it up if the car

Overfitting and Underfitting models in Machine Learning

Data Science

by Sunny Srinidhi - August 2, 20180

In most of our posts about machine learning, we've talked about overfitting and underfitting. But most of us don't yet know what those two terms mean. What does it acutally mean when a model is overfit, or underfit? Why are they considered not good? And how do they affect the accuracy of our model's predictions? These are some of the basic, but important questions we need to ask and get answers to. So let's discuss these two today. The datasets we use for training and testing our models play a huge role in the efficiency of our models. Its equally important to understand the data we're working with. The quantity and the quality of the data also matter, obviously. When the data

Different types of Validations in Machine Learning (Cross Validation)

Data Science

by Sunny Srinidhi - August 1, 20180

Now that we know what is feature selection and how to do it, let's move our focus to validating the efficiency of our model. This is known as validation or cross validation, depending on what kind of validation method you're using. But before that, let's try to understand why we need to validate our models. Validation, or Evaluation of Residuals Once you are done with fitting your model to you training data, and you've also tested it with your test data, you can't just assume that its going to work well on data that it has not seen before. In other words, you can't be sure that the model will have the desired accuracy and variance in your production environment. You need

Data Automation with AI/ML: A Comprehensive Guide

by Sunny Srinidhi - November 28, 20240

The article discusses the transformative impact of artificial intelligence (AI) and machine learning (ML) on data automation, enhancing efficiency, decision-making, and scalability in businesses. It explores trends like generative AI, AutoML, data governance, and democratization while providing real-world applications across various industries, ultimately guiding businesses in effective AI/ML integration.

The art of load balancing – Part 2

Tech

by Sunny Srinidhi - July 27, 2020July 27, 20200

There are many strategies used by a load balancer. Here, continuing from part 1, we’ll see what these strategies are and how they work.

The art of load balancing – Part 1 (Understanding a load balancer)

Tech

by Sunny Srinidhi - June 3, 2020June 3, 20200

We hear about load balancers everywhere. But what does it mean and how does it work? Can you try it out youself? Let’s see.

Proof of Concepts (POCs)

I write a lot of POC projects, especially when I'm learning something new or I need to quickly test if a data pipeline works, or maybe I'm just testing a new integration. I make all these POCs public as Github repositories. I wanted to consolidate the list of POCs in an easy to search fashion. And that's why I have this page here. Below is a list of all the POCs that I've written so far. If a particular POC has an accompanying blog post which explains the code in the POC, I have linked that blog post as well in the list below. Let me know if any of these POCs have helped you in any way.