Enhancing Data Security and Privacy in the Cloud with AWS Clean RoomsData Science by Sunny Srinidhi - May 26, 2023January 17, 20240 Data security and privacy in the cloud is becoming crucial as more organisations are embracing cloud computing and cloud storage. In this post, we’ll see how AWS Clean Rooms can help maintain data security and privacy.
Apache Spark Optimisation TechniquesData Science by Sunny Srinidhi - February 23, 2023February 23, 20230 Apache Spark is a popular big data processing tool. In this post, we are going to look at a few techniques using which we can optimise the performance of our Spark jobs.
Optimising Hive Queries with Tez Query EngineData Science by Sunny Srinidhi - June 13, 2022June 13, 20220 Hive and Tez configuration can be fine-tuned to improve the performance of queries. Let’s look at a few such techniques.
Cleaning and Normalizing Data Using AWS Glue DataBrewData Science by Sunny Srinidhi - January 17, 2022January 17, 20221 In this post, we’ll see what is AWS Glue DataBrew and how to use it to clean and transform our data in a data pipeline.
Understanding Apache Hive LLAPData Science by Sunny Srinidhi - November 18, 2021November 18, 20210 In this post, I try to explain what LLAP is for Apache Hive and how it can help us in reducing query latency.
Installing Hadoop on the new M1 Pro and M1 Max MacBook ProData Science by Sunny Srinidhi - November 5, 2021November 5, 20213 We’ll see how to install and configure Hadoop and it’s components on MacOS running on the new M1 Pro and M1 Max chips by Apple.
Installing Hadoop on Windows 11 with WSL2Data Science by Sunny Srinidhi - November 1, 2021November 1, 20213 We’ll see how to install and configure Hadoop and it’s components on Windows 11 running a Linux distro using WSL 1 or 2.
Getting Started With Apache AirflowData Science by Sunny Srinidhi - October 11, 2021October 11, 20210 I recently started working with Apache Airflow. And as is tradition, I’m telling you everything about it here.
Fake (almost) everything with FakerData Science by Sunny Srinidhi - September 30, 2021September 30, 20210 Generating customer and address data for testing has never been easier. We’ll see how to do that using the Faker Python library.
Querying Hive Tables From a Spring Boot AppData ScienceTech by Sunny Srinidhi - June 30, 2021June 30, 20211 In this post, we’ll see how to connect to a Hive database and run queries on that database from a Spring Boot application.