Understanding Word N-grams and N-gram Probability in Natural Language Processing

Data Science

by Sunny Srinidhi - November 26, 2019December 19, 20192

More in The fastText Series. N-gram is probably the easiest concept to understand in the whole machine learning space, I guess. An N-gram means a sequence of N words. So for example, “Medium blog” is a 2-gram (a bigram), “A Medium blog post” is a 4-gram, and “Write on Medium” is a 3-gram (trigram). Well, that wasn’t very interesting or exciting. True, but we still have to look at the probability used with n-grams, which is quite interesting. Why N-gram though? Before we move on to the probability stuff, let’s answer this question first. Why is it that we need to learn n-gram and the related probability? Well, in Natural Language Processing, or NLP for short, n-grams are used for a variety of things.

An intro to text classification with Facebook’s fastText (Natural Language Processing)

Data Science

by Sunny Srinidhi - November 25, 2019December 19, 20193

More in The fastText Series. Text classification is a pretty common application of machine learning. In such an application, machine learning is used to categorise a piece of text into two or more categories. There are both supervised and unsupervised learning models for text classification. In this post, we’ll see how we can use Facebook’s fastText library for some simple text classification. fastText, developed by Facebook, is a popular library for text classification. The library is an open source project on GitHub, and is pretty active. The library also provides pre-built models for text classification, both supervised and unsupervised. In this post, we’ll check out how we can train the supervised model in the library for some quick text classification. The library

Data Science vs. Artificial Intelligence vs. Machine Learning vs. Deep Learning

Data Science

by Sunny Srinidhi - November 18, 2019December 19, 20190

It’s very common these days to come across these terms - data science, artificial intelligence, machine learning, deep learning, neural networks, and much more. But what do these buzzwords actually mean? And why should you care about one or the other? I’m trying to answer these questions in this post, to the best of my capacity. But then again, I’m no expert here. This is the knowledge I’ve gained in the last few years of my data science and machine learning journey. I’m sure most of you will have better and easier ways of explaining things than I do, so I’ll be looking forward to reading your comments down below. Let’s get started then. Data Science Data science is all about data,

Top Five Machine Learning courses for beginners on Udemy

Data Science

by Sunny Srinidhi - November 18, 2019December 19, 20192

Everybody wants to do machine learning these days. Machine learning, data science, artificial intelligence, deep learning, neural network — these have become some of the most used phrases in the tech space today. I’m not saying it’s particularly bad, but it definitely gets scary for somebody who doesn’t really know what all this means but wants to get into the rat race. When you think about it, from a software developer’s point of view, these are just different types of software or applications you work on, but with more math involved. I know I’m oversimplifying what data science is, but for somebody who doesn’t have a mathematics or statistics background, it is very difficult to understand the jargon initially. I’ve been there,

Getting started with Chalice to create AWS Lambdas in Python – Step by Step Tutorial

Tech

by Sunny Srinidhi - November 14, 2019November 14, 20190

Using Chalice, you can write a Lambda function, test it locally, and even deploy the Lambda function to your development, test, or production environments. In this post, we’ll see how we can install Chalice on our local machines, write a simple REST API to return the famous “Hello, world!” response, and deploy it to a dev stage on AWS Lambda.

Forward Selection for Feature Selection in Machine Learning

Data Science

by Sunny Srinidhi - November 13, 20192

In our previous post, we saw how to perform Backward Elimination as a feature selection algorithm to weed out insignificant features from our dataset. In this post, we'll checkout the next method for feature selection, which is Forward Selection. As you can already guess, this is going to be the opposite of backward elimination, well kind of. But before that, make sure you make yourself familiar with the concept of P-value. Similar to backward elimination, even here we have a few steps to follow. We'll go one by one as usual. But before going in, you need to know that this is going to be a bit more tedious of a job than backward elimination, because you have to create a

Backward Elimination for Feature Selection in Machine Learning

Data Science

by Sunny Srinidhi - November 11, 2019November 11, 20191

When we're building a machine learning model, it is very important that we select only those features or predictors which are necessary. Suppose we have 100 features or predictors in our dataset. That doesn't necessarily mean that we need to have all 100 features in our model. This is because not all 100 features will have significant influence on the model. But then again, this doesn't mean it will be true for all cases. It depends entirely on the data we have in hand. Here is more info about why we need feature selection. There are various ways in which you can find out which features have very less impact on the model and which ones you can remove from your

Sub-6 and Millimeter Wave (mmWave) frequencies for 5G – All you need to know

Tech

by Sunny Srinidhi - November 9, 20190

5G is the next obvious upgrade to 4G and LTE that we use extensively today for our data needs when we're on the go. LTE was a huge upgrade from the much slower 3G a few years back. But in 2019, we're seeing over 1Gbps speeds with 5G. To make this a reality, wireless carriers are using a combination of different technologies and waves. In this post, I'll try to explain two of those which we see and hear in most conversations revolving around 5G - Sub-6 and Millimeter waves. To understand sub-6 and millimeter waves (mmWaves), we first need to understand how our smartphone radio signals function. As you all know, we have cell phone towers or antennas placed all

Null Hypothesis and the P-Value

Data Science

by Sunny Srinidhi - November 8, 2019November 8, 20195

When you're starting your machine learning journey, you'll come across null hypothesis and the p-value. At a certain point in your journey, it becomes quite important to know what these mean to make meaningful decisions while designing your machine learning models. So in this post, I'll try to explain what these two things mean, and you try to understand that. Now, if you don't have a background in statistics, the definitions of null hypothesis and p-value will make no sense to you. It's just gibberish going way over your head. That's what happened to me the first few times I tried to understand them. It took me a good couple of days to get an idea of what they mean. I

How to encrypt a string in Java using RSA and decrypt it in Python

Tech

by Sunny Srinidhi - November 7, 2019November 7, 20191

Recently at work, I was tasked to write a Java program which would encrypt a sensitive string using the RSA encryption algorithm. The encrypted string would then be passed on to a client over public internet. The client would then use the private key to decrypt the message. But the client is written in Python. So I have to make sure the encryption and decryption wok as expected. And as always, I wrote POCs for both. And here, I'm going to document that. Creating the key pair Before we can start the encryption, we need to have a key pair. A key pair will have a public key and a private key. The public key, as the name suggests, is public. You