Apache Airflow is another awesome tool that I discovered just recently. Just a couple of months after discovering it, I...
Data Science
I was recently tasked with creating some random customer data, with names, phone numbers, addresses, and the usual other stuff....
In this post, we'll see how we can query tables that reside in Hive using a Spring Boot application. As...
If you are new to JanusGraph and the Gremlin query language, like I am, you would be confused about the...
JanusGraph is a graph processing tool that can process graphs stored on clusters with multiple nodes. JanusGraph is designed for...
I have talked about Kinesis before, and I'm sure you've been using Kinesis for longer than me. But according to...
Parquet is an open source file format by Apache for the Hadoop infrastructure. Well, it started as a file format...
As the data generated from IoT devices, mobile devices, applications, etc. increases at an hourly rate, creating a data lake...
Lemmatization is one of the most common text pre-processing techniques used in Natural Language Processing (NLP) and machine learning in...
Stemming is one of the most common data pre-processing operations we do in almost all Natural Language Processing (NLP) projects....