Optimising Hive Queries with Tez Query Engine

Data Science

by Sunny Srinidhi - June 13, 2022June 13, 20220

Hive and Tez configuration can be fine-tuned to improve the performance of queries. Let’s look at a few such techniques.

Understanding Apache Hive LLAP

Data Science

by Sunny Srinidhi - November 18, 2021November 18, 20210

In this post, I try to explain what LLAP is for Apache Hive and how it can help us in reducing query latency.

Installing Hadoop on the new M1 Pro and M1 Max MacBook Pro

Data Science

by Sunny Srinidhi - November 5, 2021November 5, 20213

We’ll see how to install and configure Hadoop and it’s components on MacOS running on the new M1 Pro and M1 Max chips by Apple.

Installing Hadoop on Windows 11 with WSL2

Data Science

by Sunny Srinidhi - November 1, 2021November 1, 20213

We’ll see how to install and configure Hadoop and it’s components on Windows 11 running a Linux distro using WSL 1 or 2.

Installing Zsh and Oh-my-zsh on Windows 11 with WSL2

Tech

by Sunny Srinidhi - October 27, 2021October 27, 20211

In this post, which is a part of a series of to setup Windows 11 and WSL2 for big data work, I install Zsh and Oh-my-zsh and setup up aliases

Getting Started With Apache Airflow

Data Science

by Sunny Srinidhi - October 11, 2021October 11, 20210

I recently started working with Apache Airflow. And as is tradition, I’m telling you everything about it here.

Fake (almost) everything with Faker

Data Science

by Sunny Srinidhi - September 30, 2021September 30, 20210

Generating customer and address data for testing has never been easier. We’ll see how to do that using the Faker Python library.

Querying Hive Tables From a Spring Boot App

by Sunny Srinidhi - June 30, 2021June 30, 20211

In this post, we’ll see how to connect to a Hive database and run queries on that database from a Spring Boot application.

Lemmatization in Natural Language Processing (NLP) and Machine Learning

Data Science

by Sunny Srinidhi - February 26, 2020February 26, 20200

Lemmatization is one of the most common text pre-processing techniques used in Natural Language Processing (NLP) and machine learning in general. If you've already read my post about stemming of words in NLP, you'll already know that lemmatization is not that much different. Both in stemming and in lemmatization, we try to reduce a given word to its root word. The root word is called a stem in the stemming process, and it is called a lemma in the lemmatization process. But there are a few more differences to the two than that. Let's see what those are. How is Lemmatization different from Stemming In stemming, a part of the word is just chopped off at the tail end to arrive at

Stemming of words in Natural Language Processing, what is it?

Data Science

by Sunny Srinidhi - February 19, 2020August 27, 20241

Stemming is one of the most common data pre-processing operations we do in almost all Natural Language Processing (NLP) projects. If you're new to this space, it is possible that you don't exactly know what this is even though you have come across this word. You might also be confused between stemming and lemmatization, which are two similar operations. In this post, we'll see what exactly is stemming, with a few examples here and there. I hope I'll be able to explain this process in simple words for you. Stemming To put simply, stemming is the process of removing a part of a word, or reducing a word to its stem or root. This might not necessarily mean we're reducing a word