Data ScienceTech

Apache Kafka Streams and Tables, the stream-table duality

In the previous post, we tried to understand the basics of Apache's Kafka Streams. In this post, we'll build on that knowledge and see how Kafka Streams can be used both as streams and tables. Stream processing has become very common in most modern applications today. You'll have a minimum of one stream coming into your system to be processed. And depending on your application, it'll mostly be stateless. But that's not the case with all applications. We'll have some sort of data enrichment going on in between streams. Suppose you have one stream of user activity coming in. You'll ideally have a user ID attached to each fact in that stream. But down the pipeline, user ID is not going to be enough for processing. Maybe you need more information about the user to be present in t...

Read More
Data ScienceTech

Getting started with Apache Kafka Streams

In the age of big data and data science, stream processing is very significant. So it's not at all surprising that every major organisation has at least one stream processing service. Apache has a few too, but today we're going to look at Apache's Kafka Streams. Kafka is a very popular pub-sub service. And if you've worked with Kafka before, Kafka Streams is going to be very easy to understand. And if you haven't got any idea of Kafka, you don't have to worry, because most of the underlying technology has been abstracted in Kafka Streams so that you don't have to deal with consumers, producers, partitions, offsets, and the such. In this post, we'll look that a few concepts of Kafka Streams, and maybe understand how it differs from other stream processing engines. First of all, Kafka...

Read More