Real-Time Data Processing: Understanding the What, Why, Where, Who, and HowData ScienceTech by Sunny Srinidhi - October 22, 20240 In today’s data-driven world, businesses and organizations are continuously generating massive amounts of data. While processing data in batch mode remains useful, the need for instant decision-making has led to an increasing focus on real-time data processing. This article delves into what real-time data processing is, why it's essential, its various applications, the tools used to achieve it, trends shaping its evolution, and real-world use cases. What is Real-Time Data Processing? Real-time data processing refers to the capability to continuously ingest, process, and output data as soon as it is generated, with minimal latency. Unlike batch processing, which collects and processes data in large groups at set intervals (e.g., daily or hourly), real-time processing works with data immediately as it becomes available,
Understanding Data Governance: A Comprehensive GuideData ScienceTech by Sunny Srinidhi - October 18, 2024October 18, 20240 Data governance is a set of practices, policies, and standards that ensure data is managed as an asset in a consistent and reliable manner across an organization. It involves defining who owns the data, who has the right to make decisions about it, and how it can be used. This comprehensive guide aims to shed light on what data governance entails, its importance, how it can be achieved, best practices, and who should be involved in the process. What is Data Governance? Data governance refers to the collection of policies, roles, responsibilities, and procedures that oversee the management of data assets within an organization. It ensures that data is accurate, consistent, accessible, and protected from misuse. The main goal of data governance
The Trend of Cloud Repatriation: Moving Back to On-Premises InfrastructureData ScienceTech by Sunny Srinidhi - October 16, 2024October 16, 20240 In recent years, a shift in IT infrastructure strategies has seen many companies moving workloads away from public cloud services and back to on-premises setups or private cloud environments. This movement, known as "cloud repatriation," is driven by various factors that range from cost management to performance, security, and compliance concerns. While public cloud adoption surged over the past decade, the limitations of this model have led organizations to reconsider their approach, resulting in a hybrid IT strategy combining both on-premises and cloud resources. Why Companies Are Moving Back to On-Premises 1. Cost Considerations One of the most prominent factors driving cloud repatriation is the realization of the high costs associated with public cloud services. While the cloud offers scalability and flexibility, many companies
Exploring the Inner Workings of Google BigQuery: A Deep Dive into Design, Competitors, Use Cases, and Pros/ConsData Science by Sunny Srinidhi - March 13, 2024March 13, 20240 Discover the inner workings of Google BigQuery, a game-changer in big data analytics. Unravel its architecture, including the prowess of its distributed query engine, Dremel, and the innovative Capacitor technology. Compare it with competitors, explore diverse use cases from real-time analytics to healthcare, and weigh its pros and cons. Join us on a journey into the heart of data analytics excellence.
Streamline Data Transfer with AWS DataSync: A Comprehensive GuideData Science by Sunny Srinidhi - March 9, 2024March 9, 20240 Discover the power of AWS DataSync for seamless, secure, and accelerated data transfers. Learn how to optimise workflows with ease!
Understanding the Battle of Database Storage: Row-Oriented vs. ColumnarData Science by Sunny Srinidhi - March 8, 2024March 8, 20241 In the realm of database storage, row-wise and columnar approaches stand as stalwarts with distinct advantages. Row-wise storage excels in transactional operations, ensuring data integrity with simplicity. Conversely, columnar storage revolutionizes analytical querying, leveraging vertical organization for rapid attribute retrieval. Understanding their nuances is pivotal in crafting efficient, tailored database solutions for diverse data-driven needs.
Optimising Hive Queries with Tez Query EngineData Science by Sunny Srinidhi - June 13, 2022June 13, 20220 Hive and Tez configuration can be fine-tuned to improve the performance of queries. Let’s look at a few such techniques.
Understanding Apache Hive LLAPData Science by Sunny Srinidhi - November 18, 2021November 18, 20210 In this post, I try to explain what LLAP is for Apache Hive and how it can help us in reducing query latency.
Installing Hadoop on the new M1 Pro and M1 Max MacBook ProData Science by Sunny Srinidhi - November 5, 2021November 5, 20213 We’ll see how to install and configure Hadoop and it’s components on MacOS running on the new M1 Pro and M1 Max chips by Apple.
Installing Zsh and Oh-my-zsh on Windows 11 with WSL2Tech by Sunny Srinidhi - October 27, 2021October 27, 20211 In this post, which is a part of a series of to setup Windows 11 and WSL2 for big data work, I install Zsh and Oh-my-zsh and setup up aliases