Simple Apache Kafka Producer and Consumer using Spring Boot

Tech

by Sunny Srinidhi - November 23, 2018March 2, 20202

Originally published here: https://medium.com/@contactsunny/simple-apache-kafka-producer-and-consumer-using-spring-boot-41be672f4e2b Before I even start talking about Apache Kafka here, let me answer your question after you read the topic — aren’t there enough posts and guides about this topic already? Yes, there are plenty of reference documents and how-to posts about how to create Kafka producers and consumers in a Spring Boot application. Then why am I writing another post about this? Well, in the future, I’ll be talking about some advanced stuff, in the data science space. Apache Kafka is one of the most used technologies and tools in this space. It kind of becomes important to know how to work with Apache Kafka in a real-world application. So this is an introductory post to the technology, which we’ll be

Encrypting and Decrypting data in MongoDB with a SpringBoot project

Tech

by Sunny Srinidhi - January 8, 2020January 8, 20205

In quite a few applications, we'll have a requirement to keep the data in our databases encrypted so that even if somebody gets into the database, they might not understand what the data is. Encrypting is crucial in many applications. With the rise of NoSQL databases these days, we'll take a look at how we can encrypt data going into a MongoDB database from our Spring Boot application. We'll also see how we can decrypt that data after getting it from the database into our application. One thing you need to know before trying this on any production-grade application is that this will slow things down. There are two extra steps involved in this process - encrypting and decrypting the data.

How To Generate Parquet Files in Java

Data Science

by Sunny Srinidhi - April 7, 2020April 7, 202014

The Parquet file format has become very popular lately. In this post, we’ll see what it is, and how to create Parquet files in Java using Spring Boot.

Sorting in MongoDB in Java using BasicDBObject

Tech

by Sunny Srinidhi - January 24, 2020January 24, 20200

In this post, we’ll see how we can write a sort query for MongoDB in Java using the BasicDBObject class. I’ll use Spring Boot for this.

I made a website which tells if you’re wearing a mask or not – without machine learning

Tech

by Sunny Srinidhi - January 11, 2021January 11, 20211

In this post, I talk about how I built a website which can detect maks, gloves, and more – all without writing any machine learning code.

Removing stop words in Java as part of data cleaning in Artificial Intelligence

Data Science

by Sunny Srinidhi - February 5, 2020February 5, 20200

More in The fastText Series. Working with text datasets is very common in data science problems. A good example of this is sentiment analysis, where you get social network posts as data sets. Based on the content of these posts, you need to estimate the sentiment around a topic of interest. When we're working with text as the data, there are a lot of words which we want to remove from the data to "clean" it, such as normalising, removing stop words, stemming, lemmatizing, etc. In this post, we'll see how we can remove stop words from our input text to clean our data so that our analysis is based only on the actual content of the data. But wait, what are stop

Using Google’s libphonenumber Library to Parse and Validate Phone Numbers

Tech

by Sunny Srinidhi - January 9, 2020January 9, 20200

We all work with phone numbers in almost any project or product which has human users. And when the product is available to a global user base, it becomes very difficult to maintain valid phone numbers in the database. We need to make sure the phone numbers for different regions are of the proper length for their regions, add country codes, or remove them, and a lot of such validations. This could become a project of its own pretty soon. We had such an issue in one of our projects. When I was doing the research to find an easy to use and light weight tool so that I could outsource the smarts involved in this to, I came across the

Connect Apache Spark with MongoDB database using the mongo-spark-connector

by Sunny Srinidhi - April 3, 2019February 28, 20200

A couple of days back, we saw how we can connect Apache Spark to an Apache HBase database and query the data from a table using a catalog. Today, we’ll see how we can connect Apache Spark to a MongoDB database and get data directly into Spark from there. MongoDB provides us a plugin called the mongo-spark-connector, which will help us connect MongoDB and Spark without any drama at all. We just need to provide the MongoDB connection URI in the SparkConf object, and create a ReadConfig object specifying the collection name. It might sound complicated right now, but once you look at the code, you’ll understand how extremely easy this is. So, let’s look at an example. The Dataset Before we look

Connect Apache Spark to your HBase database (Spark-HBase Connector)

by Sunny Srinidhi - April 1, 2019January 31, 20202

There will be times when you’ll need the data in your HBase database to be brought into Apache Spark for processing. Usually, you’ll query the database, get the data in whatever format you fancy, and then load that into Spark, maybe using the `parallelize()`function. This works, just fine. But depending on the size of the data, this could cause delays. At least it did for our application. So after some research, we stumbled upon a Spark-HBase connector in Hortonworks repository. Now, what is this connector and why should you be considering this? The Spark-HBase Connector (shc-core) The SHC is a tool provided by Hortonworks to connect your HBase database to Apache Spark so that you can tell your Spark context to pickup the

How to build a simple data lake using Amazon Kinesis Data Firehose and Amazon S3

Data Science

by Sunny Srinidhi - March 3, 2020March 3, 20203

In this post, we’ll see how we can create a very simple, yet highly scalable data lake using Amazon’s Kinesis Data Firehose and Amazon’s S3.