Emulating Apache Kafka with Amazon SNS and SQS

Tech

by Sunny Srinidhi - January 22, 2020January 24, 20200

We’ll learn how to introduce the concept of consumer groups from Kafka in the AWS world using Amazon SNS and Amazon SQS.

Publishing messages to Amazon SNS from a Spring Boot application

Tech

by Sunny Srinidhi - January 20, 2020January 24, 20200

We’ll learn how we can publish messages to an SNS topic from a Spring Boot application. This can be done from any Java code or framework.

Stack Implementation example in Java

Tech

by Sunny Srinidhi - December 20, 2019December 23, 20192

More in The Data Structures series. A stack is one of the most simplest data structure to understand. If you had data structures in your academia, you already know what it means. It’s a simple Last In First Out (LIFO) queue. What that means is the last element to enter the stack will be first element to go out of the stack. Let’s try to understand the concept first with a few illustrations. The concept Suppose we have an empty container which looks like the container shown in the image below: Empty stack That’s pretty simple to understand. Now suppose again that we “push” a string with value “string1” to this empty stack. The stack now looks like this: Stack with one element That’s pretty simple to

Understanding Word N-grams and N-gram Probability in Natural Language Processing

Data Science

by Sunny Srinidhi - November 26, 2019December 19, 20192

More in The fastText Series. N-gram is probably the easiest concept to understand in the whole machine learning space, I guess. An N-gram means a sequence of N words. So for example, “Medium blog” is a 2-gram (a bigram), “A Medium blog post” is a 4-gram, and “Write on Medium” is a 3-gram (trigram). Well, that wasn’t very interesting or exciting. True, but we still have to look at the probability used with n-grams, which is quite interesting. Why N-gram though? Before we move on to the probability stuff, let’s answer this question first. Why is it that we need to learn n-gram and the related probability? Well, in Natural Language Processing, or NLP for short, n-grams are used for a variety of things.

An intro to text classification with Facebook’s fastText (Natural Language Processing)

Data Science

by Sunny Srinidhi - November 25, 2019December 19, 20193

More in The fastText Series. Text classification is a pretty common application of machine learning. In such an application, machine learning is used to categorise a piece of text into two or more categories. There are both supervised and unsupervised learning models for text classification. In this post, we’ll see how we can use Facebook’s fastText library for some simple text classification. fastText, developed by Facebook, is a popular library for text classification. The library is an open source project on GitHub, and is pretty active. The library also provides pre-built models for text classification, both supervised and unsupervised. In this post, we’ll check out how we can train the supervised model in the library for some quick text classification. The library

Overfitting and Underfitting models in Machine Learning

Data Science

by Sunny Srinidhi - August 2, 20180

In most of our posts about machine learning, we've talked about overfitting and underfitting. But most of us don't yet know what those two terms mean. What does it acutally mean when a model is overfit, or underfit? Why are they considered not good? And how do they affect the accuracy of our model's predictions? These are some of the basic, but important questions we need to ask and get answers to. So let's discuss these two today. The datasets we use for training and testing our models play a huge role in the efficiency of our models. Its equally important to understand the data we're working with. The quantity and the quality of the data also matter, obviously. When the data

Different types of Validations in Machine Learning (Cross Validation)

Data Science

by Sunny Srinidhi - August 1, 20180

Now that we know what is feature selection and how to do it, let's move our focus to validating the efficiency of our model. This is known as validation or cross validation, depending on what kind of validation method you're using. But before that, let's try to understand why we need to validate our models. Validation, or Evaluation of Residuals Once you are done with fitting your model to you training data, and you've also tested it with your test data, you can't just assume that its going to work well on data that it has not seen before. In other words, you can't be sure that the model will have the desired accuracy and variance in your production environment. You need

Different methods of feature selection

Data Science

by Sunny Srinidhi - July 31, 2018November 6, 20191

In our previous post, we discussed what is feature selection and why we need feature selection. In this post, we're going to look at the different methods used in feature selection. There are three main classification of feature selection methods - Filter Methods, Wrapper Methods, and Embedded Methods. We'll look at all of them individually. Filter Methods Filter methods are learning-algorithm-agnostic, which means they can be employed no matter which learning algorithm you're using. They're generally used as data pre-processors. In filter methods, each individual feature in the dataset will be scored on its correlation with the dependent variable. A variety of statistical tests will be used to calculate this correlation score. Based on this score, it will be decided whether to

Why do we need feature scaling in Machine Learning and how to do it using SciKit Learn?

Data Science

by Sunny Srinidhi - July 27, 2018November 5, 20191

When you're working with a learning model, it is important to scale the features to a range which is centered around zero. This is done so that the variance of the features are in the same range. If a feature's variance is orders of magnitude more than the variance of other features, that particular feature might dominate other features in the dataset, which is not something we want happening in our model. The aim here is to to achieve Gaussian with zero mean and unit variance. There are many ways of doing this, two most popular are standardisation and normalisation. No matter which method you choose, the SciKit Learn library provides a class to easily scale our data. We can use the StandardScaler

Use Config Caching to Speed Up Your Laravel App

Tech

by Sunny Srinidhi - March 13, 2017March 13, 20170

As web developers, we're always looking for ways to speed up our app. It's all about milliseconds today. There are several ways by which a web app or a web service could be optimised for speed. Being one of the most used and popular PHP frameworks, Laravel has a few tricks up its sleeves to make this happen. One of them is config caching. Obviously, this is not going to make tremendous improvement, but significant enough to be written about. So what is config caching? Well, it's exactly what it sounds like, you cache all your configuration so that you don't have to go looking for it every time you want to read a configuration. Laravel, as usual, has an artisan command