Data Science

ColumnTransformer in SciKit for LabelEncoding and OneHotEncoding in Machine Learning

In a very old post - Label Encoder vs. One Hot Encoder in Machine Learning - I had demonstrated how to use label encoding and one hot encoding to separate out categorical text data into numbers and different columns. But the SciKit library has come a long way since I wrote that post, and it has made life a lot more easier. The developers of the library might have realised that people use LabelEncoding and OneHotEncoding very frequently. So they decided to come up with a new library called the ColumnTransformer, which will basically combine LabelEncoding and OneHotEncoding into just one line of code. And the result is exactly the same. In this post, we'll quickly take a look at how we can do that with some code snippets. The Code First, as usual, we need to import the required li...

Read More
Data ScienceTech

Label Encoder vs. One Hot Encoder in Machine Learning

Update: SciKit has a new library called the ColumnTransformer which has replaced LabelEncoding. You can check out this updated post about ColumnTransformer to know more. If you're new to Machine Learning, you might get confused between these two - Label Encoder and One Hot Encoder. These two encoders are parts of the SciKit Learn library in Python, and they are used to convert categorical data, or text data, into numbers, which our predictive models can better understand.  Today, let's understand the difference between the two with a simple example. Label Encoding To begin with, you can find the SciKit Learn documentation for Label Encoder here.  Now, let's consider the following data: In this example, the first column is the country column, which is all text. As ...

Read More