The Road Ahead: Key Data Engineering Trends for 2025

by Sunny Srinidhi - December 31, 2024December 31, 20240

As we step into 2025, the world of data engineering is poised for transformative growth. From the rise of unified data architectures to the integration of AI-driven tools, the landscape is evolving faster than ever. This blog explores the key trends shaping the future—real-time data processing, edge computing, enhanced data governance, and more—while providing actionable insights on how professionals and organizations can adapt. Whether you’re a seasoned data engineer or just starting your journey, this comprehensive guide will help you navigate the challenges and seize the opportunities of 2025 with confidence.

Exploring the Inner Workings of Google BigQuery: A Deep Dive into Design, Competitors, Use Cases, and Pros/Cons

Data Science

by Sunny Srinidhi - March 13, 2024March 13, 20240

Discover the inner workings of Google BigQuery, a game-changer in big data analytics. Unravel its architecture, including the prowess of its distributed query engine, Dremel, and the innovative Capacitor technology. Compare it with competitors, explore diverse use cases from real-time analytics to healthcare, and weigh its pros and cons. Join us on a journey into the heart of data analytics excellence.

Data Science vs. Artificial Intelligence vs. Machine Learning vs. Deep Learning

Data Science

by Sunny Srinidhi - November 18, 2019December 19, 20190

It’s very common these days to come across these terms - data science, artificial intelligence, machine learning, deep learning, neural networks, and much more. But what do these buzzwords actually mean? And why should you care about one or the other? I’m trying to answer these questions in this post, to the best of my capacity. But then again, I’m no expert here. This is the knowledge I’ve gained in the last few years of my data science and machine learning journey. I’m sure most of you will have better and easier ways of explaining things than I do, so I’ll be looking forward to reading your comments down below. Let’s get started then. Data Science Data science is all about data,

Put data to Amazon Kinesis Firehose delivery stream using Spring Boot

by Sunny Srinidhi - September 26, 2019February 12, 20201

If you work with streams of big data which have to be collected, transformed, and analysed, you for sure would have heard of Amazon Kinesis Firehose. It is an AWS service used to load streams of data to data lakes or analytical tools, along with compressing, transforming, or encrypting the data. You can use Firehose to load streaming data to something like S3, or RedShift. From there, you can use a SQL query engine such as Amazon Athena to query this data. You can even connect this data to your BI tool and get real time analytics of the data. This could be very useful in applications where real time analysis of data is necessary. In this post, we'll see

Apache Spark Optimisation Techniques

Data Science

by Sunny Srinidhi - February 23, 2023February 23, 20230

Apache Spark is a popular big data processing tool. In this post, we are going to look at a few techniques using which we can optimise the performance of our Spark jobs.

Installing Zsh and Oh-my-zsh on Windows 11 with WSL2

Tech

by Sunny Srinidhi - October 27, 2021October 27, 20211

In this post, which is a part of a series of to setup Windows 11 and WSL2 for big data work, I install Zsh and Oh-my-zsh and setup up aliases

How To Generate Parquet Files in Java

Data Science

by Sunny Srinidhi - April 7, 2020April 7, 202014

The Parquet file format has become very popular lately. In this post, we’ll see what it is, and how to create Parquet files in Java using Spring Boot.

Getting started with Apache Kafka Streams

by Sunny Srinidhi - September 30, 2019March 12, 20201

In the age of big data and data science, stream processing is very significant. So it's not at all surprising that every major organisation has at least one stream processing service. Apache has a few too, but today we're going to look at Apache's Kafka Streams. Kafka is a very popular pub-sub service. And if you've worked with Kafka before, Kafka Streams is going to be very easy to understand. And if you haven't got any idea of Kafka, you don't have to worry, because most of the underlying technology has been abstracted in Kafka Streams so that you don't have to deal with consumers, producers, partitions, offsets, and the such. In this post, we'll look that a few concepts of

Apache Drill vs. Apache Spark – Which SQL query engine is better for you?

by Sunny Srinidhi - September 23, 2019February 13, 20200

If you are in the big data or data science or BI space, you might have heard about Apache Spark. A few of you might have also heard about Apache Drill, and a tiny bit of you might have actually worked with it. I discovered Apache Drill very recently. But since then, I've come to like what it has to offer. But the first thing that I wondered when I glanced over the capabilities of Apache Drill was, how is this different from Apache Spark? Can I use the two interchangeably? I did some research and found the answers. Here, I'm going to answer these questions for myself and maybe for you guys too. It is very important to understand that