Top Five Machine Learning courses for beginners on UdemyData Science by Sunny Srinidhi - November 18, 2019December 19, 20192 Everybody wants to do machine learning these days. Machine learning, data science, artificial intelligence, deep learning, neural network — these have become some of the most used phrases in the tech space today. I’m not saying it’s particularly bad, but it definitely gets scary for somebody who doesn’t really know what all this means but wants to get into the rat race. When you think about it, from a software developer’s point of view, these are just different types of software or applications you work on, but with more math involved. I know I’m oversimplifying what data science is, but for somebody who doesn’t have a mathematics or statistics background, it is very difficult to understand the jargon initially. I’ve been there,
Forward Selection for Feature Selection in Machine LearningData Science by Sunny Srinidhi - November 13, 20192 In our previous post, we saw how to perform Backward Elimination as a feature selection algorithm to weed out insignificant features from our dataset. In this post, we'll checkout the next method for feature selection, which is Forward Selection. As you can already guess, this is going to be the opposite of backward elimination, well kind of. But before that, make sure you make yourself familiar with the concept of P-value. Similar to backward elimination, even here we have a few steps to follow. We'll go one by one as usual. But before going in, you need to know that this is going to be a bit more tedious of a job than backward elimination, because you have to create a
Backward Elimination for Feature Selection in Machine LearningData Science by Sunny Srinidhi - November 11, 2019November 11, 20191 When we're building a machine learning model, it is very important that we select only those features or predictors which are necessary. Suppose we have 100 features or predictors in our dataset. That doesn't necessarily mean that we need to have all 100 features in our model. This is because not all 100 features will have significant influence on the model. But then again, this doesn't mean it will be true for all cases. It depends entirely on the data we have in hand. Here is more info about why we need feature selection. There are various ways in which you can find out which features have very less impact on the model and which ones you can remove from your
Null Hypothesis and the P-ValueData Science by Sunny Srinidhi - November 8, 2019November 8, 20195 When you're starting your machine learning journey, you'll come across null hypothesis and the p-value. At a certain point in your journey, it becomes quite important to know what these mean to make meaningful decisions while designing your machine learning models. So in this post, I'll try to explain what these two things mean, and you try to understand that. Now, if you don't have a background in statistics, the definitions of null hypothesis and p-value will make no sense to you. It's just gibberish going way over your head. That's what happened to me the first few times I tried to understand them. It took me a good couple of days to get an idea of what they mean. I
Fit vs. Transform in SciKit libraries for Machine LearningData Science by Sunny Srinidhi - November 7, 2019November 7, 20190 We have seen methods such as fit(), transform(), and fit_transform() in a lot of SciKit's libraries. And almost all tutorials, including the ones I've written, only tell you to just use one of these methods. The obvious question that arises here is, what do those methods mean? What do you mean by fit something and transform something? The transform() method makes some sense, it just transforms the data, but what about fit()? In this post, we'll try to understand the difference between the two. To better understand the meaning of these methods, we'll take the Imputer class as an example, because the Imputer class has these methods. But before we get started, keep in mind that fitting something like an imputer
ColumnTransformer in SciKit for LabelEncoding and OneHotEncoding in Machine LearningData Science by Sunny Srinidhi - November 6, 2019November 6, 20193 In a very old post - Label Encoder vs. One Hot Encoder in Machine Learning - I had demonstrated how to use label encoding and one hot encoding to separate out categorical text data into numbers and different columns. But the SciKit library has come a long way since I wrote that post, and it has made life a lot more easier. The developers of the library might have realised that people use LabelEncoding and OneHotEncoding very frequently. So they decided to come up with a new library called the ColumnTransformer, which will basically combine LabelEncoding and OneHotEncoding into just one line of code. And the result is exactly the same. In this post, we'll quickly take a look at
Invoke an AWS Lambda Function from another Lambda FunctionData ScienceTech by Sunny Srinidhi - November 4, 2019November 4, 20190 I recently discovered that you can't invoke more than one Lambda function in AWS for an S3 event, with the same prefix and suffix (or just with the same suffix, which was the issue in my case). So I wanted a way to invoke one Lambda function from another Lambda function. If you're feeling kind of lost, check out the problem statement in my Github project. That could possibly add some context to the problem. If you don't want to go there, I'll try to explain it here again. The Problem and the Requirement In one of our projects, we have a Lambda function which is invoked whenever a text file is uploaded to a particular S3 bucket. The Lambda function takes
Apache Kafka Streams and Tables, the stream-table dualityData ScienceTech by Sunny Srinidhi - October 1, 2019February 25, 20200 In the previous post, we tried to understand the basics of Apache's Kafka Streams. In this post, we'll build on that knowledge and see how Kafka Streams can be used both as streams and tables. Stream processing has become very common in most modern applications today. You'll have a minimum of one stream coming into your system to be processed. And depending on your application, it'll mostly be stateless. But that's not the case with all applications. We'll have some sort of data enrichment going on in between streams. Suppose you have one stream of user activity coming in. You'll ideally have a user ID attached to each fact in that stream. But down the pipeline, user ID is
Getting started with Apache Kafka StreamsData ScienceTech by Sunny Srinidhi - September 30, 2019March 12, 20201 In the age of big data and data science, stream processing is very significant. So it's not at all surprising that every major organisation has at least one stream processing service. Apache has a few too, but today we're going to look at Apache's Kafka Streams. Kafka is a very popular pub-sub service. And if you've worked with Kafka before, Kafka Streams is going to be very easy to understand. And if you haven't got any idea of Kafka, you don't have to worry, because most of the underlying technology has been abstracted in Kafka Streams so that you don't have to deal with consumers, producers, partitions, offsets, and the such. In this post, we'll look that a few concepts of
Put data to Amazon Kinesis Firehose delivery stream using Spring BootData ScienceTech by Sunny Srinidhi - September 26, 2019February 12, 20201 If you work with streams of big data which have to be collected, transformed, and analysed, you for sure would have heard of Amazon Kinesis Firehose. It is an AWS service used to load streams of data to data lakes or analytical tools, along with compressing, transforming, or encrypting the data. You can use Firehose to load streaming data to something like S3, or RedShift. From there, you can use a SQL query engine such as Amazon Athena to query this data. You can even connect this data to your BI tool and get real time analytics of the data. This could be very useful in applications where real time analysis of data is necessary. In this post, we'll see