Data ScienceTech

Use Apache Drill with Spring Boot or Java to query data using SQL queries

In the last few posts, we saw how to connect Apache Drill with MongoDB and also how we can connect it to Kafka to query data using simple SQL queries. But when you want to move this to an actual real world project, you can't sit around querying data from a terminal all day long. You want to write a piece of code which does the dirty work for you. But how exactly do you use Apache Drill within your code? Today, we'll see how we can achieve this with Spring Boot, or pretty much any other Java program. The Dependencies For this POC, I'm going to write a simple Spring Boot CommandLineRunner program. But you can use pretty much any other Java framework or vanilla Java code for this. If you have a dependency management tool such as Maven or Gradle, you can just add the dependency ...

Read More
Data ScienceTech

Apache Drill vs. Apache Spark – Which SQL query engine is better for you?

If you are in the big data or data science or BI space, you might have heard about Apache Spark. A few of you might have also heard about Apache Drill, and a tiny bit of you might have actually worked with it. I discovered Apache Drill very recently. But since then, I've come to like what it has to offer. But the first thing that I wondered when I glanced over the capabilities of Apache Drill was, how is this different from Apache Spark? Can I use the two interchangeably? I did some research and found the answers. Here, I'm going to answer these questions for myself and maybe for you guys too. It is very important to understand that there is a fundamental difference between the two, how they are implemented, and what they are capable of. With Apache Drill, we write SQL quer...

Read More
Data ScienceTech

Getting Started with Apache Drill and MongoDB

Not a lot of people have heard of Apache Drill. That is because Drill caters to very specific use cases, it's very niche. But when used, it can make significant differences to the way you interact with data. First, let's see what Apache Drill is, and then how we can connect our MongoDB data source to Drill and easily query data. What is Apache Drill? According to their website, Apache Drill is "Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage." That's pretty much self-explanatory. So, Drill is a tool to query Hadoop, MongoDB, and other NoSQL databases. You can write simple SQL queries that run on the data stored in other databases, and you get the result in a row-column format. The best part is you can even query Apache Kafka and AWS S3 data with this. ...

Read More