Skip to content

Thursday, September 18, 2025

I used AI to build a SaaS, and here’s my experience
The Impact of Generative AI on Data Engineering
Generative AI in Data Engineering: Transforming the Landscape
Understanding Data Mesh: A Paradigm Shift in Data Management
The Road Ahead: Key Data Engineering Trends for 2025

The ContactSunny Blog

Tech from one dev to another

Data Science
Tech
General
Proof of Concepts (POCs)
About Me / Products
Must Watch Videos

X

Search for:

You are here

Home > Data Science (Page 2)

Data Science

Understanding the Battle of Database Storage: Row-Oriented vs. Columnar

Data Science

by Sunny Srinidhi - March 8, 2024March 8, 20241

In the realm of database storage, row-wise and columnar approaches stand as stalwarts with distinct advantages. Row-wise storage excels in transactional operations, ensuring data integrity with simplicity. Conversely, columnar storage revolutionizes analytical querying, leveraging vertical organization for rapid attribute retrieval. Understanding their nuances is pivotal in crafting efficient, tailored database solutions for diverse data-driven needs.

Tagged big data bigdata columnar storage data science database storage databases datascience examples of columnar stoarge programming row-oriented storage row-wise storage tech technology types of database storages why choose columnar storage?

Enhancing Data Security and Privacy in the Cloud with AWS Clean Rooms

Data Science

by Sunny Srinidhi - May 26, 2023January 17, 20240

AWS-Clean-Rooms

Data security and privacy in the cloud is becoming crucial as more organisations are embracing cloud computing and cloud storage. In this post, we’ll see how AWS Clean Rooms can help maintain data security and privacy.

Tagged amazon clean rooms analytics aws aws clean rooms aws data services big data clean rooms clean rooms in the cloud cloud computing data privacy and security data privacy with clean rooms data security with clean rooms tech

Apache Spark Optimisation Techniques

Data Science

by Sunny Srinidhi - February 23, 2023February 23, 20230

Apache Spark is a popular big data processing tool. In this post, we are going to look at a few techniques using which we can optimise the performance of our Spark jobs.

Tagged apache spark apache spark sql big data bigdata data science optimize spark performance programming spark spark caching spark executor memory spark garbage collection spark kyro serializer spark memory management spark optimization spark performance optimization spark performance tuning spark serializer spark sql spark storage memory spark unified memory sparksql tech technology

Optimising Hive Queries with Tez Query Engine

Data Science

by Sunny Srinidhi - June 13, 2022June 13, 20220

Hive and Tez configuration can be fine-tuned to improve the performance of queries. Let’s look at a few such techniques.

Tagged apache hive big data bigdata data science datascience hadoop hive optimisation hive optimization hive performance configuration hive tez hive tez query performance opimize hive performance optimize hive query performance optimize hive tez performance programming tech technology tez optimization

Cleaning and Normalizing Data Using AWS Glue DataBrew

Data Science

by Sunny Srinidhi - January 17, 2022January 17, 20221

stephen-dawson-qwtCeJ5cLYs-unsplash

In this post, we’ll see what is AWS Glue DataBrew and how to use it to clean and transform our data in a data pipeline.

Tagged aws aws data brew aws databrew aws databrew clean data aws databrew data pipeline aws databrew data transformation aws databrew etl aws databrew format column aws databrew format data time aws databrew how to aws databrew recipe aws databrew recipes for cleaning data aws databrew tutorial aws databrew with s3 aws glue aws glue databrew cleaning data with aws databrew data transformation with aws databrew etl with aws databrew getting started with aws databrew

Understanding Apache Hive LLAP

Data Science

by Sunny Srinidhi - November 18, 2021November 18, 20210

In this post, I try to explain what LLAP is for Apache Hive and how it can help us in reducing query latency.

Tagged apache apache hive apache hive llap apache mapreduce apache spark apache tez big data bigdata data science datascience explain hive llap hadoop hadoop llap hadoop mapreduce hive hive llap how leap works llap programming tech technology understanding hive llap what is apache hive llap what is hive llap

Installing Hadoop on the new M1 Pro and M1 Max MacBook Pro

Data Science

by Sunny Srinidhi - November 5, 2021November 5, 20213

We’ll see how to install and configure Hadoop and it’s components on MacOS running on the new M1 Pro and M1 Max chips by Apple.

Tagged big data bigdata data science datascience hadoop install hadoop install Hadoop on m1 mac install Hadoop on m1 macbook install hadoop on m1 MacBook pro install Hadoop on m1 max mac install Hadoop on m1 max MacBook pro install Hadoop on m1 max macos install hadoop on m1 pro mac install Hadoop on m1 pro MacBook pro install hadoop on m1 pro macos installing hadoop programming tech technology

Installing Hadoop on Windows 11 with WSL2

Data Science

by Sunny Srinidhi - November 1, 2021November 1, 20213

We’ll see how to install and configure Hadoop and it’s components on Windows 11 running a Linux distro using WSL 1 or 2.

Tagged big data bigdata data science hadoop how to install hdoop install hadoop install hadoop on widows 11 install hadoop on windows 11 install hadoop on wsl install hadoop windows 11 wsl install hadoop wsl2 install windows wsl 2 installing hadoop machine learning programming tech technology windows 11 windows 11 hadoop windows11 windows11 hadoop

Getting Started With Apache Airflow

Data Science

by Sunny Srinidhi - October 11, 2021October 11, 20210

I recently started working with Apache Airflow. And as is tradition, I’m telling you everything about it here.

Tagged airflow airflow dags airflow dags with python apache airflow big data bigdata dags in python DAGs with airflow data science datascience getting started with apache airflow

Fake (almost) everything with Faker

Data Science

by Sunny Srinidhi - September 30, 2021September 30, 20210

Generating customer and address data for testing has never been easier. We’ll see how to do that using the Faker Python library.

Tagged big data data science datascience faker how to faker library faker package generating fake data getting started with faker pip install faker programming python python faker tech technology

Posts navigation

© 2025

Powered by WordPress | Theme: AccessPress Mag

Data Science
Tech
General
Proof of Concepts (POCs)
About Me / Products
Must Watch Videos