Optimising Hive Queries with Tez Query EngineData Science by Sunny Srinidhi - June 13, 2022June 13, 20220 Hive and Tez configuration can be fine-tuned to improve the performance of queries. Let’s look at a few such techniques.
Understanding Apache Hive LLAPData Science by Sunny Srinidhi - November 18, 2021November 18, 20210 In this post, I try to explain what LLAP is for Apache Hive and how it can help us in reducing query latency.
Installing Hadoop on the new M1 Pro and M1 Max MacBook ProData Science by Sunny Srinidhi - November 5, 2021November 5, 20213 We’ll see how to install and configure Hadoop and it’s components on MacOS running on the new M1 Pro and M1 Max chips by Apple.
Installing Hadoop on Windows 11 with WSL2Data Science by Sunny Srinidhi - November 1, 2021November 1, 20213 We’ll see how to install and configure Hadoop and it’s components on Windows 11 running a Linux distro using WSL 1 or 2.
Installing Zsh and Oh-my-zsh on Windows 11 with WSL2Tech by Sunny Srinidhi - October 27, 2021October 27, 20211 In this post, which is a part of a series of to setup Windows 11 and WSL2 for big data work, I install Zsh and Oh-my-zsh and setup up aliases
Fake (almost) everything with FakerData Science by Sunny Srinidhi - September 30, 2021September 30, 20210 Generating customer and address data for testing has never been easier. We’ll see how to do that using the Faker Python library.
Emulating Apache Kafka with Amazon SNS and SQSTech by Sunny Srinidhi - January 22, 2020January 24, 20200 We’ll learn how to introduce the concept of consumer groups from Kafka in the AWS world using Amazon SNS and Amazon SQS.
Stack Implementation example in JavaTech by Sunny Srinidhi - December 20, 2019December 23, 20192 More in The Data Structures series. A stack is one of the most simplest data structure to understand. If you had data structures in your academia, you already know what it means. It’s a simple Last In First Out (LIFO) queue. What that means is the last element to enter the stack will be first element to go out of the stack. Let’s try to understand the concept first with a few illustrations. The concept Suppose we have an empty container which looks like the container shown in the image below: Empty stack That’s pretty simple to understand. Now suppose again that we “push” a string with value “string1” to this empty stack. The stack now looks like this: Stack with one element That’s pretty simple to
Understanding Word N-grams and N-gram Probability in Natural Language ProcessingData Science by Sunny Srinidhi - November 26, 2019December 19, 20192 More in The fastText Series. N-gram is probably the easiest concept to understand in the whole machine learning space, I guess. An N-gram means a sequence of N words. So for example, “Medium blog” is a 2-gram (a bigram), “A Medium blog post” is a 4-gram, and “Write on Medium” is a 3-gram (trigram). Well, that wasn’t very interesting or exciting. True, but we still have to look at the probability used with n-grams, which is quite interesting. Why N-gram though? Before we move on to the probability stuff, let’s answer this question first. Why is it that we need to learn n-gram and the related probability? Well, in Natural Language Processing, or NLP for short, n-grams are used for a variety of things.
Data Science vs. Artificial Intelligence vs. Machine Learning vs. Deep LearningData Science by Sunny Srinidhi - November 18, 2019December 19, 20190 It’s very common these days to come across these terms - data science, artificial intelligence, machine learning, deep learning, neural networks, and much more. But what do these buzzwords actually mean? And why should you care about one or the other? I’m trying to answer these questions in this post, to the best of my capacity. But then again, I’m no expert here. This is the knowledge I’ve gained in the last few years of my data science and machine learning journey. I’m sure most of you will have better and easier ways of explaining things than I do, so I’ll be looking forward to reading your comments down below. Let’s get started then. Data Science Data science is all about data,