Installing Hadoop on the new M1 Pro and M1 Max MacBook Pro

Data Science

by Sunny Srinidhi - November 5, 2021November 5, 20213

We’ll see how to install and configure Hadoop and it’s components on MacOS running on the new M1 Pro and M1 Max chips by Apple.

Installing Hadoop on Windows 11 with WSL2

Data Science

by Sunny Srinidhi - November 1, 2021November 1, 20213

We’ll see how to install and configure Hadoop and it’s components on Windows 11 running a Linux distro using WSL 1 or 2.

Understanding Apache Hive LLAP

Data Science

by Sunny Srinidhi - November 18, 2021November 18, 20210

In this post, I try to explain what LLAP is for Apache Hive and how it can help us in reducing query latency.

Installing Zsh and Oh-my-zsh on Windows 11 with WSL2

Tech

by Sunny Srinidhi - October 27, 2021October 27, 20211

In this post, which is a part of a series of to setup Windows 11 and WSL2 for big data work, I install Zsh and Oh-my-zsh and setup up aliases

Querying Hive Tables From a Spring Boot App

by Sunny Srinidhi - June 30, 2021June 30, 20211

In this post, we’ll see how to connect to a Hive database and run queries on that database from a Spring Boot application.

How To Generate Parquet Files in Java

Data Science

by Sunny Srinidhi - April 7, 2020April 7, 202014

The Parquet file format has become very popular lately. In this post, we’ll see what it is, and how to create Parquet files in Java using Spring Boot.

Getting Started with Apache Drill and MongoDB

by Sunny Srinidhi - September 23, 2019February 28, 20203

Not a lot of people have heard of Apache Drill. That is because Drill caters to very specific use cases, it's very niche. But when used, it can make significant differences to the way you interact with data. First, let's see what Apache Drill is, and then how we can connect our MongoDB data source to Drill and easily query data. What is Apache Drill? According to their website, Apache Drill is "Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage." That's pretty much self-explanatory. So, Drill is a tool to query Hadoop, MongoDB, and other NoSQL databases. You can write simple SQL queries that run on the data stored in other databases, and you get the result in a row-column format. The

Connect Apache Spark to your HBase database (Spark-HBase Connector)

by Sunny Srinidhi - April 1, 2019January 31, 20202

There will be times when you’ll need the data in your HBase database to be brought into Apache Spark for processing. Usually, you’ll query the database, get the data in whatever format you fancy, and then load that into Spark, maybe using the `parallelize()`function. This works, just fine. But depending on the size of the data, this could cause delays. At least it did for our application. So after some research, we stumbled upon a Spark-HBase connector in Hortonworks repository. Now, what is this connector and why should you be considering this? The Spark-HBase Connector (shc-core) The SHC is a tool provided by Hortonworks to connect your HBase database to Apache Spark so that you can tell your Spark context to pickup the

About Me

Connect with me on: Twitter | LinkedIn | Medium Products Links Links is a simple bookmarking service which allows you to bookmark your favorite websites from your Android device, or from the Chrome browser. The service also lets your organise your bookmarks into various folders so that its easy to keep track of your bookmarks. Your bookmarks are synced between your Chrome browser and your Android device. So no matter if you're on a desktop, a laptop, an Android smartphone, or an Android tablet, your bookmarks are available. You can have a look at the web interface and register, which will let you use the Chrome extension and the Android app. Nothing Pro As the name suggests, this app does absolutely nothing. It just has a label which says, well,