Home>>Data Science>>Use Apache Drill with Spring Boot or Java to query data using SQL queries
Data ScienceTech

Use Apache Drill with Spring Boot or Java to query data using SQL queries

In the last few posts, we saw how to connect Apache Drill with MongoDB and also how we can connect it to Kafka to query data using simple SQL queries. But when you want to move this to an actual real world project, you can’t sit around querying data from a terminal all day long. You want to write a piece of code which does the dirty work for you. But how exactly do you use Apache Drill within your code? Today, we’ll see how we can achieve this with Spring Boot, or pretty much any other Java program.


The Dependencies

For this POC, I’m going to write a simple Spring Boot CommandLineRunner program. But you can use pretty much any other Java framework or vanilla Java code for this. If you have a dependency management tool such as Maven or Gradle, you can just add the dependency in the project and move on to coding. Otherwise, you can add the required .jar file to the class path, and that should work just fine. All the required .jar files come with the distribution of Apache Drill.

We’ll go with Maven for this project. Add the following Drill dependency in your project’s pom.xml file within the dependencies section.

<dependency>
	<groupId>org.apache.drill.exec</groupId>
	<artifactId>drill-jdbc-all</artifactId>
	<version>1.16.0</version>
</dependency>

This dependency is available in the Maven Central repository, so you don’t need to add any extra repositories.


The Code

Let’s get to the fun part now. First, we need to load the Drill JDBC driver in the Java code. For this, we’ll use the Class.forName() method:

Class.forName("org.apache.drill.jdbc.Driver");

Next, we need to create a connection using DriverManager and the Drill JDBC connection string. If you’re using a standalone local instance of Apache Drill, you need to use the following connection string:

jdbc:drill:drillbit=localhost

If you’re using Drill along with Zookeeper, you’ll have to use this connection string:

jdbc:drill:zk=local

Since I’m using a local instance, I’ll create a connection to Drill like this:

Connection connection = DriverManager.getConnection("jdbc:drill:drillbit=localhost");

Next, we need to create a Statement object, using which we’ll run our actual query. Once we have the object, we’ll use the .executeQuery() method on the object to get our ResultSet:

Statement statement = connection.createStatement();
ResultSet resultSet = statement.executeQuery("select * from kafka.`drill` limit 10");

I’m using the same data sets and examples from the previous MongoDB and Kafka examples.

As you can see from the example above, we’re getting data from a Kafka topic called drill. Let’s see how we can access the data:

while(resultSet.next()){
    logger.info("ID: " + resultSet.getString(1));
    logger.info("First Name: " + resultSet.getString(2));
    logger.info("Last Name: " + resultSet.getString(3));
    logger.info("Email: " + resultSet.getString(4));
    logger.info("Gender: " + resultSet.getString(5));
    logger.info("IP Address: " + resultSet.getString(6));
    logger.info("----------------------------------");
}

Next, let’s query a MongoDB database and get some data from there:

resultSet = statement.executeQuery("select * from mongo.drill.sampleData limit 10");

We can use the same while() loop above to print the data.

And that’s pretty much it. We are now able to use Apache Drill as a simple JDBC source within a Java program to fetch data from any source that Drill supports. You can try the same with complex SQL queries and see how Drill behaves.

If you’re interested in the complete Spring Boot project for this example, you can checkout the same on my GitHub profile here.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: