Hive and Tez configuration can be fine-tuned to improve the performance of queries. Let’s look at a few such techniques.
Tag: hadoop
Understanding Apache Hive LLAP
Installing Hadoop on the new M1 Pro and M1 Max MacBook Pro
Installing Hadoop on Windows 11 with WSL2
Querying Hive Tables From a Spring Boot App
Connect Apache Spark with MongoDB database using the mongo-spark-connector
A couple of days back, we saw how we can connect Apache Spark to an Apache HBase database and query the data from a table using a catalog. Today, we’ll see how we can connect Apache Spark to a MongoDB database and get data directly into Spark from there. MongoDB provides us a plugin called the mongo-spark-connector, which will help us connect MongoDB and Spark without any drama at all. We just need to provide the MongoDB connection URI in the SparkConf object, and create a ReadConfig object specifying the collection name. It might sound complicated right now, but once you look at the code, you’ll understand how extremely easy this is. So, let’s look at an example. The Dataset Before we look