Home>>Data Science>>Analyse Kafka messages with SQL queries using Apache Drill
Data ScienceTech

Analyse Kafka messages with SQL queries using Apache Drill

In the previous post, we figured out how to connect MongoDB with Apache Drill and query data with SQL queries. In this post, let’s extend that knowledge and see how we can use similar SQL queries to analyse our Kafka messages.

Configuring the Kafka storage plugin in Apache Drill is quite simple, very similar to how we configured the MongoDB storage plugin. First, we run our local instances of Apache Drill, Apache Zookeeper, and Apache Kafka. After this, head over to http://localhost:8047/storage, where we can enable the Kafka plugin. You should see it in the list to the right of the page. Click the Enable button. The storage plugin will be enabled. After this, we need to add a few configuration parameters to start querying data from Kafka. Click the Update button next to Kafka, which will open up a new page with a JSON configuration. Replace that JSON with the following:

{
   "type": "kafka",
   "kafkaConsumerProps": {
     "key.deserializer": "org.apache.kafka.common.serialization.ByteArrayDeserializer",
     "auto.offset.reset": "earliest",
     "bootstrap.servers": "localhost:9092",
     "group.id": "drill-query-consumer-1",
     "enable.auto.commit": "true",
     "value.deserializer": "org.apache.kafka.common.serialization.ByteArrayDeserializer",
     "session.timeout.ms": "30000"
   },
   "enabled": true
 }

You can change this configuration if you want. Change the bootstrap server address if you’re connecting to a remote bootstrap server. Once this is done, we can start querying data from Kafka topics. So let’s switch to the terminal and look at some queries.

Querying Apache Kafka messages with Drill SQL queries

First, let’s see if the Kafka storage plugin is indeed enabled. To do this, we’ll just run the show databases; query. This will list all the storage plugins we have enabled, and databases within those services, if applicable. If you run this query, your output should be similar to this:

As you can see, kafka is listed as a database. Let’s switch to that using the following command:

use kafka;

To start reading data from a Kafka topic, we need to first see if that topic is recognised by Drill. Each topic in Kafka will appear inside Drill as a table in the kafka database. So we just need to list out all the tables in this database. For testing purposes, you can create a new topic. I created the topic “drill” for testing with Drill. Let’s see if that appears as a table in this database.

And there it is. We’ll run our query on this table.

One thing you need to note here is that right now, with the latest version of Apache Drill and the Kafka storage plugin, only JSON messages work. If you send anything other than JSON, Drill throws up an exception and will not work. So let’s take care of that.

Let’s send some random JSON messages using a Kafka console producer and see if Drill will be able to query those messages.

I’ll be sending the following JSON messages (four in total) from a console producer to the “drill” topic. This is some sample data I got online after a simple Google search.

{
  "id": 1,
  "first_name": "Jeanette",
  "last_name": "Penddreth",
  "email": "jpenddreth0@census.gov",
  "gender": "Female",
  "ip_address": "26.58.193.2"
}
{
  "id": 2,
  "first_name": "Giavani",
  "last_name": "Frediani",
  "email": "gfrediani1@senate.gov",
  "gender": "Male",
  "ip_address": "229.179.4.212"
}
{
  "id": 3,
  "first_name": "Noell",
  "last_name": "Bea",
  "email": "nbea2@imageshack.us",
  "gender": "Female",
  "ip_address": "180.66.162.255"
}
{
  "id": 4,
  "first_name": "Willard",
  "last_name": "Valek",
  "email": "wvalek3@vk.com",
  "gender": "Male",
  "ip_address": "67.76.188.26"
}

Let’s now run the following simple SQL command on the table in Drill and see what output we get:

select * from drill;

And that is the result that we want. Querying Apache Kafka from Drill is that simple. I don’t know why I hadn’t come across this tool sooner.

1 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: