out() vs. outE() – JanusGraph and Gremlin Data Science by Sunny Srinidhi - March 3, 2021March 3, 20210 If you are new to JanusGraph and the Gremlin query language, like I am, you would be confused about the out(), outE(), in(), and inE() methods. If you look at examples of these functions, you’ll not be able to comprehend the difference easily. Read more... “out() vs. outE() – JanusGraph and Gremlin”
Getting Started With JanusGraph Data Science by Sunny Srinidhi - February 25, 2021February 25, 20211 JanusGraph is a graph processing tool that can process graphs stored on clusters with multiple nodes. JanusGraph is designed for massive clusters and for real-time traversals and analytics queries. In this post, we’ll look at a few queries that you would want to run the very first time you install JanusGraph and start playing with the Gremlin console. Read more... “Getting Started With JanusGraph”
Kinesis Data Streams vs. Kinesis Firehose Delivery Streams Data Science by Sunny Srinidhi - May 25, 2020May 25, 20200 I have talked about Kinesis before, and I'm sure you've been using Kinesis for longer than me. But according to what I've seen, not all teams or companies use all parts of Kinesis. And, there are four parts in Kinesis: Ingest and process streaming data with Kinesis streams - Kinesis Data StreamsDeliver streaming data with Kinesis Firehose delivery streams - Kinesis Firehose Delivery StreamsAnalyse streaming data with Kinesis analytics applications - Kinesis AnalyticsIngest and process media streams with Kinesis video streams - Kinesis Video Streams All these four parts offer something different. Well, the last two are definitely different than the first two. But it's the first two that I see a lot of people getting confused with. So I thought I'll
How To Generate Parquet Files in Java Data Science by Sunny Srinidhi - April 7, 2020April 7, 202013 Parquet is an open source file format by Apache for the Hadoop infrastructure. Well, it started as a file format for Hadoop, but it has since become very popular and even cloud service providers such as AWS have started supporting the file format. Read more... “How To Generate Parquet Files in Java”
How to build a simple data lake using Amazon Kinesis Data Firehose and Amazon S3 Data Science by Sunny Srinidhi - March 3, 2020March 3, 20203 As the data generated from IoT devices, mobile devices, applications, etc. increases at an hourly rate, creating a data lake to store all that data is getting crucial for almost any application at scale. There are many tools and services that you could use to create a data lake. Read more... “How to build a simple data lake using Amazon Kinesis Data Firehose and Amazon S3”
Lemmatization in Natural Language Processing (NLP) and Machine Learning Data Science by Sunny Srinidhi - February 26, 2020February 26, 20200 Lemmatization is one of the most common text pre-processing techniques used in Natural Language Processing (NLP) and machine learning in general. If you've already read my post about stemming of words in NLP, you'll already know that lemmatization is not that much different. Both in stemming and in lemmatization, we try to reduce a given word to its root word. The root word is called a stem in the stemming process, and it is called a lemma in the lemmatization process. But there are a few more differences to the two than that. Let's see what those are. How is Lemmatization different from Stemming In stemming, a part of the word is just chopped off at the tail end to arrive at
Stemming of words in Natural Language Processing, what is it? Data Science by Sunny Srinidhi - February 19, 20201 Stemming is one of the most common data pre-processing operations we do in almost all Natural Language Processing (NLP) projects. If you're new to this space, it is possible that you don't exactly know what this is even though you have come across this word. You might also be confused between stemming and lemmatization, which are two similar operations. In this post, we'll see what exactly is stemming, with a few examples here and there. I hope I'll be able to explain this process in simple words for you. Stemming To put simply, stemming is the process of removing a part of a word, or reducing a word to its stem or root. This might not necessarily mean we're reducing a word
Removing stop words in Java as part of data cleaning in Artificial Intelligence Data Science by Sunny Srinidhi - February 5, 2020February 5, 20200 More in The fastText Series. Working with text datasets is very common in data science problems. A good example of this is sentiment analysis, where you get social network posts as data sets. Based on the content of these posts, you need to estimate the sentiment around a topic of interest. When we're working with text as the data, there are a lot of words which we want to remove from the data to "clean" it, such as normalising, removing stop words, stemming, lemmatizing, etc. In this post, we'll see how we can remove stop words from our input text to clean our data so that our analysis is based only on the actual content of the data. But wait, what are stop
Descriptive and Inferential statistics – the two types of statistics Data Science by Sunny Srinidhi - January 30, 2020January 30, 20200 If you’re new to the world of data science, you’ll know that lack of knowledge in statistics could sometimes be very frustrating and hinder progress. It becomes very important to know at least the basics of statistics. In this post, we’re going back to the basics. Read more... “Descriptive and Inferential statistics – the two types of statistics”
An Intro to Affective Computing Data Science by Sunny Srinidhi - January 7, 2020January 7, 20200 Not a lot of us have heard of Affective Computing. Most people I have spoken to about this didn't know anything about Affective Computing. So I thought, I'll just write an intro, explaining what I have understood about the discipline and hopefully, will get to learn more from the comments. So let's get started. Affecting computing is all about understanding human emotions in a human-machine interface system and responding based on those emotions. Consider this, you get into an ATM vestibule to draw some cash, but you're tensed about getting late to your date, who is already waiting for you at the restaurant. If anybody sees you in this condition at the ATM vestibule, they'll be able to easily understand that