I have already written quite a few posts about Apache Kafka. It’s an awesome tool for parallel and asynchronous processing. You can have multiple producers and multiple consumers listening to a topic to process each and every message coming out of the topic. You can even have consumer groups, where services listen to the same topic, receive the same message, but process the message differently. This is of interest for us today. I’ll explain what exactly it means, how Kafka does it, and how we can achieve the same results in the AWS world using Amazon SQS and . So let’s get started.
Understanding the problem
Let’s suppose you are running an e-commerce website. Whenever a customer places an order, you want the following things to happen:
- The order details need to be updated to the dispatch team.
- The total number of orders metric has to be updated.
- The order details have to be logged.
It would be cumbersome, and a bad design, for one service to do all these three things. So you decide to employ Apache Kafka here. Whenever a user places an order, you form a JSON object with all the order details and produce that message to a Kafka topic conveniently named “orders.” There are three different services, one for informing the dispatch team about the order, one for incrementing the metric of orders, and another for logging. These three services are listening for messages in that “orders” topic. So whenever a message is produced to that topic, they get that message.
But the problem is, similar to how Amazon SQS is designed, in Kafka, if there are multiple consumers for the same topic, the messages are distributed among the consumers, instead of each one of them getting a copy of the same message. So the problem in our example is, we’ll only be able to inform the dispatch team about the order, or increment the metric, or log the order.
This is exactly the case when we have multiple consumers for an SQS queue. To solve this in Apache Kafka, we have something called as consumer groups, where we can group our consumers into different groups and then start listening to the same topic. And because the services belong to different groups, each service gets a copy of the message. So now, all three of our services get the same message and our design will start working. I hope the following illustration explains this situation accurately.
The question now is, how would be achieve the same on the Amazon side of things?
Emulating Apache Kafka with AWS
The thing is, you just can’t emulate Kafka’s consumer groups with Amazon SQS, there just isn’t any feature similar to that. When you have multiple consumers for the same queue in an SQS setup, the messages will be distributed among all the consumers. So to emulate Kafka’s consumer groups, we need to introduce Amazon SNS into the setup. Let’s see why that is.
When you switch from producing messages to an SQS queue to an SNS topic, there isn’t much code changes you need to do. Just use a different client and call a different method. You’re set. The changes come in the AWS console. Let’s continue with the same example that we looked at earlier, the e-commerce service. We need three consumer groups, according to that example. So we’ll create three different SQS queues. And from the SQS console itself, we can subscribe these queues to an SNS topic. What happens here is, whenever we write something to the SNS topic, a copy of that message or data is sent to each SQS queue that has subscribed to the SNS topic. This way, because we have three queues, we’ll get a copy of the message in all three queues. We’ll change our consumers to read messages from these three new queues.
This way, we have true parallel processing of the same message from SQS consumers. The only thing that changed is that we’re writing to an SNS topic instead of an SQS queue. Hopefully, the following illustration makes more sense than what I tried to explain with words:
If this is still not clear, please let me know in the comments, and I’ll definitely help in clearing out confusions.
Become a Patron!