You are here
Home > Search Results for "kinesis"

Kinesis Data Streams vs. Kinesis Firehose Delivery Streams

sheep

I have talked about Kinesis before, and I'm sure you've been using Kinesis for longer than me. But according to what I've seen, not all teams or companies use all parts of Kinesis. And, there are four parts in Kinesis: Ingest and process streaming data with Kinesis streams - Kinesis Data Streams Deliver streaming data with Kinesis Firehose delivery streams - Kinesis Firehose Delivery Streams Analyse streaming data with Kinesis analytics applications - Kinesis Analytics Ingest and process media streams with Kinesis video streams - Kinesis Video Streams All these four parts offer something different. Well, the last two are definitely different than the first two. But it's the first two that I see a lot of people getting confused with. So I thought I'll

How to build a simple data lake using Amazon Kinesis Data Firehose and Amazon S3

data lake

In this post, we’ll see how we can create a very simple, yet highly scalable data lake using Amazon’s Kinesis Data Firehose and Amazon’s S3.

Put data to Amazon Kinesis Firehose delivery stream using Spring Boot

Amazon Kinesis Firehose

If you work with streams of big data which have to be collected, transformed, and analysed, you for sure would have heard of Amazon Kinesis Firehose. It is an AWS service used to load streams of data to data lakes or analytical tools, along with compressing, transforming, or encrypting the data. You can use Firehose to load streaming data to something like S3, or RedShift. From there, you can use a SQL query engine such as Amazon Athena to query this data. You can even connect this data to your BI tool and get real time analytics of the data. This could be very useful in applications where real time analysis of data is necessary. In this post, we'll see

Real-Time Data Processing: Understanding the What, Why, Where, Who, and How

data processing

In today’s data-driven world, businesses and organizations are continuously generating massive amounts of data. While processing data in batch mode remains useful, the need for instant decision-making has led to an increasing focus on real-time data processing. This article delves into what real-time data processing is, why it's essential, its various applications, the tools used to achieve it, trends shaping its evolution, and real-world use cases. What is Real-Time Data Processing? Real-time data processing refers to the capability to continuously ingest, process, and output data as soon as it is generated, with minimal latency. Unlike batch processing, which collects and processes data in large groups at set intervals (e.g., daily or hourly), real-time processing works with data immediately as it becomes available,

Cleaning and Normalizing Data Using AWS Glue DataBrew

stephen-dawson-qwtCeJ5cLYs-unsplash

In this post, we’ll see what is AWS Glue DataBrew and how to use it to clean and transform our data in a data pipeline.

How To Generate Parquet Files in Java

parquet logo

The Parquet file format has become very popular lately. In this post, we’ll see what it is, and how to create Parquet files in Java using Spring Boot.

Choreography-based Saga for Microservices and Serverless Applications

choreography

How do you take care of transactions in a microservices or serverless architecture? We’ll talk about choreography-based saga to solve this.

Proof of Concepts (POCs)

I write a lot of POC projects, especially when I'm learning something new or I need to quickly test if a data pipeline works, or maybe I'm just testing a new integration. I make all these POCs public as Github repositories. I wanted to consolidate the list of POCs in an easy to search fashion. And that's why I have this page here. Below is a list of all the POCs that I've written so far. If a particular POC has an accompanying blog post which explains the code in the POC, I have linked that blog post as well in the list below. Let me know if any of these POCs have helped you in any way.

Top