Emulating Apache Kafka with Amazon SNS and SQS

Tech

by Sunny Srinidhi - January 22, 2020January 24, 20200

We’ll learn how to introduce the concept of consumer groups from Kafka in the AWS world using Amazon SNS and Amazon SQS.

Apache Kafka Streams and Tables, the stream-table duality

by Sunny Srinidhi - October 1, 2019February 25, 20200

In the previous post, we tried to understand the basics of Apache's Kafka Streams. In this post, we'll build on that knowledge and see how Kafka Streams can be used both as streams and tables. Stream processing has become very common in most modern applications today. You'll have a minimum of one stream coming into your system to be processed. And depending on your application, it'll mostly be stateless. But that's not the case with all applications. We'll have some sort of data enrichment going on in between streams. Suppose you have one stream of user activity coming in. You'll ideally have a user ID attached to each fact in that stream. But down the pipeline, user ID is

Getting started with Apache Kafka Streams

by Sunny Srinidhi - September 30, 2019March 12, 20201

In the age of big data and data science, stream processing is very significant. So it's not at all surprising that every major organisation has at least one stream processing service. Apache has a few too, but today we're going to look at Apache's Kafka Streams. Kafka is a very popular pub-sub service. And if you've worked with Kafka before, Kafka Streams is going to be very easy to understand. And if you haven't got any idea of Kafka, you don't have to worry, because most of the underlying technology has been abstracted in Kafka Streams so that you don't have to deal with consumers, producers, partitions, offsets, and the such. In this post, we'll look that a few concepts of

Analyse Kafka messages with SQL queries using Apache Drill

by Sunny Srinidhi - September 23, 2019January 13, 20201

In the previous post, we figured out how to connect MongoDB with Apache Drill and query data with SQL queries. In this post, let's extend that knowledge and see how we can use similar SQL queries to analyse our Kafka messages. Configuring the Kafka storage plugin in Apache Drill is quite simple, very similar to how we configured the MongoDB storage plugin. First, we run our local instances of Apache Drill, Apache Zookeeper, and Apache Kafka. After this, head over to http://localhost:8047/storage, where we can enable the Kafka plugin. You should see it in the list to the right of the page. Click the Enable button. The storage plugin will be enabled. After this, we need to add a few configuration

How you can improve your backend services’ performance using Apache Kafka

Tech

by Sunny Srinidhi - November 27, 2018February 25, 20201

In most real world applications, we have a RESTful API service facing various client applications and a collection of backend services which process the data coming from those clients. Depending on the application, the architecture might have various services spread across multiple clusters of servers, and some form of queue or messaging service gluing them together. Today, we're going to talk about one such messaging service - Apache Kafka - and how it can improve the performance of your services. We're going to assume that we have at least two microservices, one for the APIs that are exposed to the world, and one which processes the requests coming in from the API microservice, but in an async fashion. Because this is

Simple Apache Kafka Producer and Consumer using Spring Boot

Tech

by Sunny Srinidhi - November 23, 2018March 2, 20202

Originally published here: https://medium.com/@contactsunny/simple-apache-kafka-producer-and-consumer-using-spring-boot-41be672f4e2b Before I even start talking about Apache Kafka here, let me answer your question after you read the topic — aren’t there enough posts and guides about this topic already? Yes, there are plenty of reference documents and how-to posts about how to create Kafka producers and consumers in a Spring Boot application. Then why am I writing another post about this? Well, in the future, I’ll be talking about some advanced stuff, in the data science space. Apache Kafka is one of the most used technologies and tools in this space. It kind of becomes important to know how to work with Apache Kafka in a real-world application. So this is an introductory post to the technology, which we’ll be

The Road Ahead: Key Data Engineering Trends for 2025

by Sunny Srinidhi - December 31, 2024December 31, 20240

As we step into 2025, the world of data engineering is poised for transformative growth. From the rise of unified data architectures to the integration of AI-driven tools, the landscape is evolving faster than ever. This blog explores the key trends shaping the future—real-time data processing, edge computing, enhanced data governance, and more—while providing actionable insights on how professionals and organizations can adapt. Whether you’re a seasoned data engineer or just starting your journey, this comprehensive guide will help you navigate the challenges and seize the opportunities of 2025 with confidence.

Real-Time Data Processing: Understanding the What, Why, Where, Who, and How

by Sunny Srinidhi - October 22, 20240

In today’s data-driven world, businesses and organizations are continuously generating massive amounts of data. While processing data in batch mode remains useful, the need for instant decision-making has led to an increasing focus on real-time data processing. This article delves into what real-time data processing is, why it's essential, its various applications, the tools used to achieve it, trends shaping its evolution, and real-world use cases. What is Real-Time Data Processing? Real-time data processing refers to the capability to continuously ingest, process, and output data as soon as it is generated, with minimal latency. Unlike batch processing, which collects and processes data in large groups at set intervals (e.g., daily or hourly), real-time processing works with data immediately as it becomes available,