Cleaning and Normalizing Data Using AWS Glue DataBrewData Science by Sunny Srinidhi - January 17, 2022January 17, 20221 In this post, we’ll see what is AWS Glue DataBrew and how to use it to clean and transform our data in a data pipeline.
Understanding Apache Hive LLAPData Science by Sunny Srinidhi - November 18, 2021November 18, 20210 In this post, I try to explain what LLAP is for Apache Hive and how it can help us in reducing query latency.
Installing Hadoop on the new M1 Pro and M1 Max MacBook ProData Science by Sunny Srinidhi - November 5, 2021November 5, 20213 We’ll see how to install and configure Hadoop and it’s components on MacOS running on the new M1 Pro and M1 Max chips by Apple.
Installing Hadoop on Windows 11 with WSL2Data Science by Sunny Srinidhi - November 1, 2021November 1, 20213 We’ll see how to install and configure Hadoop and it’s components on Windows 11 running a Linux distro using WSL 1 or 2.
Getting Started With Apache AirflowData Science by Sunny Srinidhi - October 11, 2021October 11, 20210 I recently started working with Apache Airflow. And as is tradition, I’m telling you everything about it here.
Fake (almost) everything with FakerData Science by Sunny Srinidhi - September 30, 2021September 30, 20210 Generating customer and address data for testing has never been easier. We’ll see how to do that using the Faker Python library.
Querying Hive Tables From a Spring Boot AppData ScienceTech by Sunny Srinidhi - June 30, 2021June 30, 20211 In this post, we’ll see how to connect to a Hive database and run queries on that database from a Spring Boot application.
out() vs. outE() – JanusGraph and GremlinData Science by Sunny Srinidhi - March 3, 2021March 3, 20210 JanusGraph and Gremlin have the out() and outE() functions which help with traversals. But what’s the difference between the two? Let’s see.
Getting Started With JanusGraphData Science by Sunny Srinidhi - February 25, 2021February 25, 20211 JanusGraph is a graph processing tool that can query distributed graph data in milliseconds. In this post, we’ll see how to get started with it.
Kinesis Data Streams vs. Kinesis Firehose Delivery StreamsData Science by Sunny Srinidhi - May 25, 2020August 27, 20240 I have talked about Kinesis before, and I'm sure you've been using Kinesis for longer than me. But according to what I've seen, not all teams or companies use all parts of Kinesis. And, there are four parts in Kinesis: Ingest and process streaming data with Kinesis streams - Kinesis Data Streams Deliver streaming data with Kinesis Firehose delivery streams - Kinesis Firehose Delivery Streams Analyse streaming data with Kinesis analytics applications - Kinesis Analytics Ingest and process media streams with Kinesis video streams - Kinesis Video Streams All these four parts offer something different. Well, the last two are definitely different than the first two. But it's the first two that I see a lot of people getting confused with. So I thought I'll