You are here
Home > Search Results for "big data"

Data Science vs. Artificial Intelligence vs. Machine Learning vs. Deep Learning

It’s very common these days to come across these terms - data science, artificial intelligence, machine learning, deep learning, neural networks, and much more. But what do these buzzwords actually mean? And why should you care about one or the other? I’m trying to answer these questions in this post, to the best of my capacity. But then again, I’m no expert here. This is the knowledge I’ve gained in the last few years of my data science and machine learning journey. I’m sure most of you will have better and easier ways of explaining things than I do, so I’ll be looking forward to reading your comments down below. Let’s get started then. Data Science Data science is all about data,

Put data to Amazon Kinesis Firehose delivery stream using Spring Boot

Amazon Kinesis Firehose

If you work with streams of big data which have to be collected, transformed, and analysed, you for sure would have heard of Amazon Kinesis Firehose. It is an AWS service used to load streams of data to data lakes or analytical tools, along with compressing, transforming, or encrypting the data. You can use Firehose to load streaming data to something like S3, or RedShift. From there, you can use a SQL query engine such as Amazon Athena to query this data. You can even connect this data to your BI tool and get real time analytics of the data. This could be very useful in applications where real time analysis of data is necessary. In this post, we'll see

Apache Spark Optimisation Techniques


Apache Spark is one of the most popular big data processing tools today. It’s used extensively for data sizes small to large. The availability of Spark in more than one programming language makes it a favourite tool for data engineers and data scientists coming from various backgrounds. Read more... “Apache Spark Optimisation Techniques”

Installing Zsh and Oh-my-zsh on Windows 11 with WSL2


Before we begin, you might ask, why am I writing on something this trivial? I sold off my old MacBook Pro because I’m super excited about the new M1 Pro MacBook Pros. I have pre-ordered one of those and am waiting for it to come. Read more... “Installing Zsh and Oh-my-zsh on Windows 11 with WSL2”

The Dunning-Kruger Effect In Tech

the dunning-kruger effect

This is not the kind of post I usually write on my blog. This is more of a psychology lecture than a how-to tech tutorial. But it’s not completely irrelevant as well, because I’m going to talk about my experience with the Dunning-Kruger effect in tech that I’ve seen over the last decade. Read more... “The Dunning-Kruger Effect In Tech”

How To Generate Parquet Files in Java

parquet logo

Parquet is an open source file format by Apache for the Hadoop infrastructure. Well, it started as a file format for Hadoop, but it has since become very popular and even cloud service providers such as AWS have started supporting the file format. Read more... “How To Generate Parquet Files in Java”

Getting started with Apache Kafka Streams


In the age of big data and data science, stream processing is very significant. So it's not at all surprising that every major organisation has at least one stream processing service. Apache has a few too, but today we're going to look at Apache's Kafka Streams. Kafka is a very popular pub-sub service. And if you've worked with Kafka before, Kafka Streams is going to be very easy to understand. And if you haven't got any idea of Kafka, you don't have to worry, because most of the underlying technology has been abstracted in Kafka Streams so that you don't have to deal with consumers, producers, partitions, offsets, and the such. In this post, we'll look that a few concepts of

Apache Drill vs. Apache Spark – Which SQL query engine is better for you?


If you are in the big data or data science or BI space, you might have heard about Apache Spark. A few of you might have also heard about Apache Drill, and a tiny bit of you might have actually worked with it. I discovered Apache Drill very recently. But since then, I've come to like what it has to offer. But the first thing that I wondered when I glanced over the capabilities of Apache Drill was, how is this different from Apache Spark? Can I use the two interchangeably? I did some research and found the answers. Here, I'm going to answer these questions for myself and maybe for you guys too. It is very important to understand that

About Me

Connect with me on: Twitter | LinkedIn | Medium Products Links Links is  a simple bookmarking service which allows you to bookmark your favorite websites from your Android device, or from the Chrome browser. The service also lets your organise your bookmarks into various folders so that its easy to keep track of your bookmarks. Your bookmarks are synced between your Chrome browser and your Android device. So no matter if you're on a desktop, a laptop, an Android smartphone, or an Android tablet, your bookmarks are available. You can have a look at the web interface and register, which will let you use the Chrome extension and the Android app. Nothing Pro As the name suggests, this app does absolutely nothing. It just has a label which says, well,

Understanding Apache Hive LLAP

apache hive

Apache Hive is a complex system when you look at it, but once you go looking for more info, it’s more interesting than complex. There are multiple query engines available for Hive, and then there’s LLAP on top of the query engines to make real-time, interactive queries more workable. Read more... “Understanding Apache Hive LLAP”