Kinesis Data Streams vs. Kinesis Firehose Delivery Streams

Data Science

by Sunny Srinidhi - May 25, 2020August 27, 20240

I have talked about Kinesis before, and I'm sure you've been using Kinesis for longer than me. But according to what I've seen, not all teams or companies use all parts of Kinesis. And, there are four parts in Kinesis: Ingest and process streaming data with Kinesis streams - Kinesis Data Streams Deliver streaming data with Kinesis Firehose delivery streams - Kinesis Firehose Delivery Streams Analyse streaming data with Kinesis analytics applications - Kinesis Analytics Ingest and process media streams with Kinesis video streams - Kinesis Video Streams All these four parts offer something different. Well, the last two are definitely different than the first two. But it's the first two that I see a lot of people getting confused with. So I thought I'll

How to build a simple data lake using Amazon Kinesis Data Firehose and Amazon S3

Data Science

by Sunny Srinidhi - March 3, 2020March 3, 20203

In this post, we’ll see how we can create a very simple, yet highly scalable data lake using Amazon’s Kinesis Data Firehose and Amazon’s S3.

Put data to Amazon Kinesis Firehose delivery stream using Spring Boot

by Sunny Srinidhi - September 26, 2019February 12, 20201

If you work with streams of big data which have to be collected, transformed, and analysed, you for sure would have heard of Amazon Kinesis Firehose. It is an AWS service used to load streams of data to data lakes or analytical tools, along with compressing, transforming, or encrypting the data. You can use Firehose to load streaming data to something like S3, or RedShift. From there, you can use a SQL query engine such as Amazon Athena to query this data. You can even connect this data to your BI tool and get real time analytics of the data. This could be very useful in applications where real time analysis of data is necessary. In this post, we'll see

Real-Time Data Processing: Understanding the What, Why, Where, Who, and How

by Sunny Srinidhi - October 22, 20240

In today’s data-driven world, businesses and organizations are continuously generating massive amounts of data. While processing data in batch mode remains useful, the need for instant decision-making has led to an increasing focus on real-time data processing. This article delves into what real-time data processing is, why it's essential, its various applications, the tools used to achieve it, trends shaping its evolution, and real-world use cases. What is Real-Time Data Processing? Real-time data processing refers to the capability to continuously ingest, process, and output data as soon as it is generated, with minimal latency. Unlike batch processing, which collects and processes data in large groups at set intervals (e.g., daily or hourly), real-time processing works with data immediately as it becomes available,

Cleaning and Normalizing Data Using AWS Glue DataBrew

Data Science

by Sunny Srinidhi - January 17, 2022January 17, 20221

In this post, we’ll see what is AWS Glue DataBrew and how to use it to clean and transform our data in a data pipeline.

Explore your Amazon S3 data online using Filestash

Tech

by Sunny Srinidhi - April 29, 2020April 29, 20200

Filestash is a very handy tool in your browser which helps you nativage your S3 buckets and folders easily, and even edit files online.

How To Generate Parquet Files in Java

Data Science

by Sunny Srinidhi - April 7, 2020April 7, 202014

The Parquet file format has become very popular lately. In this post, we’ll see what it is, and how to create Parquet files in Java using Spring Boot.

Choreography-based Saga for Microservices and Serverless Applications

Tech

by Sunny Srinidhi - April 1, 20200

How do you take care of transactions in a microservices or serverless architecture? We’ll talk about choreography-based saga to solve this.

Proof of Concepts (POCs)

I write a lot of POC projects, especially when I'm learning something new or I need to quickly test if a data pipeline works, or maybe I'm just testing a new integration. I make all these POCs public as Github repositories. I wanted to consolidate the list of POCs in an easy to search fashion. And that's why I have this page here. Below is a list of all the POCs that I've written so far. If a particular POC has an accompanying blog post which explains the code in the POC, I have linked that blog post as well in the list below. Let me know if any of these POCs have helped you in any way.