Optimising Hive Queries with Tez Query Engine Data Science by Sunny Srinidhi - June 13, 2022June 13, 20220 Hive provides us the option of executing SQL queries with a few different query engines. It ships with the native MapReduce engine. But we can switch that to Tez which has gained popularity since its launch, or we can also use Apache Spark as well. Read more... “Optimising Hive Queries with Tez Query Engine”
Understanding Apache Hive LLAP Data Science by Sunny Srinidhi - November 18, 2021November 18, 20210 Apache Hive is a complex system when you look at it, but once you go looking for more info, it’s more interesting than complex. There are multiple query engines available for Hive, and then there’s LLAP on top of the query engines to make real-time, interactive queries more workable. Read more... “Understanding Apache Hive LLAP”
Installing Hadoop on the new M1 Pro and M1 Max MacBook Pro Data Science by Sunny Srinidhi - November 5, 2021November 5, 20213 In the previous series of posts, I wrote about how to install the complete Hadoop stack on Windows 11 using WSL 2. And now that the new MacBook Pro laptops are available with the brand new M1 Pro and M1 Max SOCs, here’s a guide on how to install the same Hadoop stack on these laptops. Read more... “Installing Hadoop on the new M1 Pro and M1 Max MacBook Pro”
Installing Hadoop on Windows 11 with WSL2 Data Science by Sunny Srinidhi - November 1, 2021November 1, 20213 In the previous post, we saw how to install a Linux distro on Windows 11 using WSL2 and then how to install Zsh and on-my-zsh to make the terminal more customizable. In this post, we’ll see how we can install the complete Hadoop environment on the same Windows 11 machine using WSL. Read more... “Installing Hadoop on Windows 11 with WSL2”
Querying Hive Tables From a Spring Boot App Data Science Tech by Sunny Srinidhi - June 30, 2021June 30, 20211 In this post, we’ll see how we can query tables that reside in Hive using a Spring Boot application. As always, I’m going to use a Spring Boot web app with a few GET APIs to show how we can query data from Hive. Read more... “Querying Hive Tables From a Spring Boot App”
Connect Apache Spark with MongoDB database using the mongo-spark-connector Data Science Tech by Sunny Srinidhi - April 3, 2019February 28, 20200 A couple of days back, we saw how we can connect Apache Spark to an Apache HBase database and query the data from a table using a catalog. Today, we’ll see how we can connect Apache Spark to a MongoDB database and get data directly into Spark from there. MongoDB provides us a plugin called the mongo-spark-connector, which will help us connect MongoDB and Spark without any drama at all. We just need to provide the MongoDB connection URI in the SparkConf object, and create a ReadConfig object specifying the collection name. It might sound complicated right now, but once you look at the code, you’ll understand how extremely easy this is. So, let’s look at an example. The Dataset Before we look