Installing Hadoop on Windows 11 with WSL2Data Science by Sunny Srinidhi - November 1, 2021November 1, 20213 We’ll see how to install and configure Hadoop and it’s components on Windows 11 running a Linux distro using WSL 1 or 2.
Installing Zsh and Oh-my-zsh on Windows 11 with WSL2Tech by Sunny Srinidhi - October 27, 2021October 27, 20211 In this post, which is a part of a series of to setup Windows 11 and WSL2 for big data work, I install Zsh and Oh-my-zsh and setup up aliases
Getting Started With Apache AirflowData Science by Sunny Srinidhi - October 11, 2021October 11, 20210 I recently started working with Apache Airflow. And as is tradition, I’m telling you everything about it here.
Fake (almost) everything with FakerData Science by Sunny Srinidhi - September 30, 2021September 30, 20210 Generating customer and address data for testing has never been easier. We’ll see how to do that using the Faker Python library.
Querying Hive Tables From a Spring Boot AppData ScienceTech by Sunny Srinidhi - June 30, 2021June 30, 20211 In this post, we’ll see how to connect to a Hive database and run queries on that database from a Spring Boot application.
Lemmatization in Natural Language Processing (NLP) and Machine LearningData Science by Sunny Srinidhi - February 26, 2020February 26, 20200 Lemmatization is one of the most common text pre-processing techniques used in Natural Language Processing (NLP) and machine learning in general. If you've already read my post about stemming of words in NLP, you'll already know that lemmatization is not that much different. Both in stemming and in lemmatization, we try to reduce a given word to its root word. The root word is called a stem in the stemming process, and it is called a lemma in the lemmatization process. But there are a few more differences to the two than that. Let's see what those are. How is Lemmatization different from Stemming In stemming, a part of the word is just chopped off at the tail end to arrive at
Stemming of words in Natural Language Processing, what is it?Data Science by Sunny Srinidhi - February 19, 2020August 27, 20241 Stemming is one of the most common data pre-processing operations we do in almost all Natural Language Processing (NLP) projects. If you're new to this space, it is possible that you don't exactly know what this is even though you have come across this word. You might also be confused between stemming and lemmatization, which are two similar operations. In this post, we'll see what exactly is stemming, with a few examples here and there. I hope I'll be able to explain this process in simple words for you. Stemming To put simply, stemming is the process of removing a part of a word, or reducing a word to its stem or root. This might not necessarily mean we're reducing a word
Removing stop words in Java as part of data cleaning in Artificial IntelligenceData Science by Sunny Srinidhi - February 5, 2020February 5, 20200 More in The fastText Series. Working with text datasets is very common in data science problems. A good example of this is sentiment analysis, where you get social network posts as data sets. Based on the content of these posts, you need to estimate the sentiment around a topic of interest. When we're working with text as the data, there are a lot of words which we want to remove from the data to "clean" it, such as normalising, removing stop words, stemming, lemmatizing, etc. In this post, we'll see how we can remove stop words from our input text to clean our data so that our analysis is based only on the actual content of the data. But wait, what are stop
An Intro to Affective ComputingData Science by Sunny Srinidhi - January 7, 2020January 7, 20200 Not a lot of us have heard of Affective Computing. Most people I have spoken to about this didn't know anything about Affective Computing. So I thought, I'll just write an intro, explaining what I have understood about the discipline and hopefully, will get to learn more from the comments. So let's get started. Affecting computing is all about understanding human emotions in a human-machine interface system and responding based on those emotions. Consider this, you get into an ATM vestibule to draw some cash, but you're tensed about getting late to your date, who is already waiting for you at the restaurant. If anybody sees you in this condition at the ATM vestibule, they'll be able to easily understand that
Optimising a fastText model for better accuracyData Science by Sunny Srinidhi - December 3, 2019December 19, 20190 More in The fastText Series. In our previous post, we saw what n-grams are and how they are useful. Before that post, we built a simple text classifier using Facebook’s fastText library. In this post, we’ll see how we can optimise that model for better accuracy. Precision and Recall Precision and recall are two things we need to know to better understand the accuracy of our models. And these two things are not very difficult to understand. Precision is the number of correct labels that were predicted by the fastText model, and recall is the number of labels, out of the correct labels, that were successfully predicted. That might be a bit confusing, so let’s look at an example to understand it better. Suppose for a sentence