The Impact of Generative AI on Data Engineering

Data Science

by Sunny Srinidhi - March 9, 2025March 9, 20250

Generative AI is transforming the field of data engineering by automating complex processes such as data augmentation, cleaning, integration, and anomaly detection. Unlike traditional AI, which focuses on analysis and prediction, Generative AI creates new data based on learned patterns. This capability improves data quality, enhances efficiency, and enables scalable solutions. However, challenges like data privacy, model bias, and ethical concerns must be carefully managed. As AI technology advances, its role in data engineering will continue to expand, leading to more intelligent and automated data workflows.

Data Automation with AI/ML: A Comprehensive Guide

by Sunny Srinidhi - November 28, 20240

The article discusses the transformative impact of artificial intelligence (AI) and machine learning (ML) on data automation, enhancing efficiency, decision-making, and scalability in businesses. It explores trends like generative AI, AutoML, data governance, and democratization while providing real-world applications across various industries, ultimately guiding businesses in effective AI/ML integration.

Lemmatization in Natural Language Processing (NLP) and Machine Learning

Data Science

by Sunny Srinidhi - February 26, 2020February 26, 20200

Lemmatization is one of the most common text pre-processing techniques used in Natural Language Processing (NLP) and machine learning in general. If you've already read my post about stemming of words in NLP, you'll already know that lemmatization is not that much different. Both in stemming and in lemmatization, we try to reduce a given word to its root word. The root word is called a stem in the stemming process, and it is called a lemma in the lemmatization process. But there are a few more differences to the two than that. Let's see what those are. How is Lemmatization different from Stemming In stemming, a part of the word is just chopped off at the tail end to arrive at

Removing stop words in Java as part of data cleaning in Artificial Intelligence

Data Science

by Sunny Srinidhi - February 5, 2020February 5, 20200

More in The fastText Series. Working with text datasets is very common in data science problems. A good example of this is sentiment analysis, where you get social network posts as data sets. Based on the content of these posts, you need to estimate the sentiment around a topic of interest. When we're working with text as the data, there are a lot of words which we want to remove from the data to "clean" it, such as normalising, removing stop words, stemming, lemmatizing, etc. In this post, we'll see how we can remove stop words from our input text to clean our data so that our analysis is based only on the actual content of the data. But wait, what are stop

Optimising a fastText model for better accuracy

Data Science

by Sunny Srinidhi - December 3, 2019December 19, 20190

More in The fastText Series. In our previous post, we saw what n-grams are and how they are useful. Before that post, we built a simple text classifier using Facebook’s fastText library. In this post, we’ll see how we can optimise that model for better accuracy. Precision and Recall Precision and recall are two things we need to know to better understand the accuracy of our models. And these two things are not very difficult to understand. Precision is the number of correct labels that were predicted by the fastText model, and recall is the number of labels, out of the correct labels, that were successfully predicted. That might be a bit confusing, so let’s look at an example to understand it better. Suppose for a sentence

Understanding Word N-grams and N-gram Probability in Natural Language Processing

Data Science

by Sunny Srinidhi - November 26, 2019December 19, 20192

More in The fastText Series. N-gram is probably the easiest concept to understand in the whole machine learning space, I guess. An N-gram means a sequence of N words. So for example, “Medium blog” is a 2-gram (a bigram), “A Medium blog post” is a 4-gram, and “Write on Medium” is a 3-gram (trigram). Well, that wasn’t very interesting or exciting. True, but we still have to look at the probability used with n-grams, which is quite interesting. Why N-gram though? Before we move on to the probability stuff, let’s answer this question first. Why is it that we need to learn n-gram and the related probability? Well, in Natural Language Processing, or NLP for short, n-grams are used for a variety of things.

An intro to text classification with Facebook’s fastText (Natural Language Processing)

Data Science

by Sunny Srinidhi - November 25, 2019December 19, 20193

More in The fastText Series. Text classification is a pretty common application of machine learning. In such an application, machine learning is used to categorise a piece of text into two or more categories. There are both supervised and unsupervised learning models for text classification. In this post, we’ll see how we can use Facebook’s fastText library for some simple text classification. fastText, developed by Facebook, is a popular library for text classification. The library is an open source project on GitHub, and is pretty active. The library also provides pre-built models for text classification, both supervised and unsupervised. In this post, we’ll check out how we can train the supervised model in the library for some quick text classification. The library

Data Science vs. Artificial Intelligence vs. Machine Learning vs. Deep Learning

Data Science

by Sunny Srinidhi - November 18, 2019December 19, 20190

It’s very common these days to come across these terms - data science, artificial intelligence, machine learning, deep learning, neural networks, and much more. But what do these buzzwords actually mean? And why should you care about one or the other? I’m trying to answer these questions in this post, to the best of my capacity. But then again, I’m no expert here. This is the knowledge I’ve gained in the last few years of my data science and machine learning journey. I’m sure most of you will have better and easier ways of explaining things than I do, so I’ll be looking forward to reading your comments down below. Let’s get started then. Data Science Data science is all about data,

Top Five Machine Learning courses for beginners on Udemy

Data Science

by Sunny Srinidhi - November 18, 2019December 19, 20192

Everybody wants to do machine learning these days. Machine learning, data science, artificial intelligence, deep learning, neural network — these have become some of the most used phrases in the tech space today. I’m not saying it’s particularly bad, but it definitely gets scary for somebody who doesn’t really know what all this means but wants to get into the rat race. When you think about it, from a software developer’s point of view, these are just different types of software or applications you work on, but with more math involved. I know I’m oversimplifying what data science is, but for somebody who doesn’t have a mathematics or statistics background, it is very difficult to understand the jargon initially. I’ve been there,