Optimising Hive Queries with Tez Query Engine

Data Science

by Sunny Srinidhi - June 13, 2022June 13, 20220

Hive and Tez configuration can be fine-tuned to improve the performance of queries. Let’s look at a few such techniques.

Understanding Apache Hive LLAP

Data Science

by Sunny Srinidhi - November 18, 2021November 18, 20210

In this post, I try to explain what LLAP is for Apache Hive and how it can help us in reducing query latency.

Installing Hadoop on the new M1 Pro and M1 Max MacBook Pro

Data Science

by Sunny Srinidhi - November 5, 2021November 5, 20213

We’ll see how to install and configure Hadoop and it’s components on MacOS running on the new M1 Pro and M1 Max chips by Apple.

Installing Hadoop on Windows 11 with WSL2

Data Science

by Sunny Srinidhi - November 1, 2021November 1, 20213

We’ll see how to install and configure Hadoop and it’s components on Windows 11 running a Linux distro using WSL 1 or 2.

Querying Hive Tables From a Spring Boot App

by Sunny Srinidhi - June 30, 2021June 30, 20211

In this post, we’ll see how to connect to a Hive database and run queries on that database from a Spring Boot application.

Connect Apache Spark with MongoDB database using the mongo-spark-connector

by Sunny Srinidhi - April 3, 2019February 28, 20200

A couple of days back, we saw how we can connect Apache Spark to an Apache HBase database and query the data from a table using a catalog. Today, we’ll see how we can connect Apache Spark to a MongoDB database and get data directly into Spark from there. MongoDB provides us a plugin called the mongo-spark-connector, which will help us connect MongoDB and Spark without any drama at all. We just need to provide the MongoDB connection URI in the SparkConf object, and create a ReadConfig object specifying the collection name. It might sound complicated right now, but once you look at the code, you’ll understand how extremely easy this is. So, let’s look at an example. The Dataset Before we look