Simple Apache Kafka Producer and Consumer using Spring BootTech by Sunny Srinidhi - November 23, 2018March 2, 20202 Originally published here: https://medium.com/@contactsunny/simple-apache-kafka-producer-and-consumer-using-spring-boot-41be672f4e2b Before I even start talking about Apache Kafka here, let me answer your question after you read the topic — aren’t there enough posts and guides about this topic already? Yes, there are plenty of reference documents and how-to posts about how to create Kafka producers and consumers in a Spring Boot application. Then why am I writing another post about this? Well, in the future, I’ll be talking about some advanced stuff, in the data science space. Apache Kafka is one of the most used technologies and tools in this space. It kind of becomes important to know how to work with Apache Kafka in a real-world application. So this is an introductory post to the technology, which we’ll be
Encrypting and Decrypting data in MongoDB with a SpringBoot projectTech by Sunny Srinidhi - January 8, 2020January 8, 20205 In quite a few applications, we'll have a requirement to keep the data in our databases encrypted so that even if somebody gets into the database, they might not understand what the data is. Encrypting is crucial in many applications. With the rise of NoSQL databases these days, we'll take a look at how we can encrypt data going into a MongoDB database from our Spring Boot application. We'll also see how we can decrypt that data after getting it from the database into our application. One thing you need to know before trying this on any production-grade application is that this will slow things down. There are two extra steps involved in this process - encrypting and decrypting the data.
How To Generate Parquet Files in JavaData Science by Sunny Srinidhi - April 7, 2020April 7, 202014 The Parquet file format has become very popular lately. In this post, we’ll see what it is, and how to create Parquet files in Java using Spring Boot.
Sorting in MongoDB in Java using BasicDBObjectTech by Sunny Srinidhi - January 24, 2020January 24, 20200 In this post, we’ll see how we can write a sort query for MongoDB in Java using the BasicDBObject class. I’ll use Spring Boot for this.
I made a website which tells if you’re wearing a mask or not – without machine learningTech by Sunny Srinidhi - January 11, 2021January 11, 20211 In this post, I talk about how I built a website which can detect maks, gloves, and more – all without writing any machine learning code.
Removing stop words in Java as part of data cleaning in Artificial IntelligenceData Science by Sunny Srinidhi - February 5, 2020February 5, 20200 More in The fastText Series. Working with text datasets is very common in data science problems. A good example of this is sentiment analysis, where you get social network posts as data sets. Based on the content of these posts, you need to estimate the sentiment around a topic of interest. When we're working with text as the data, there are a lot of words which we want to remove from the data to "clean" it, such as normalising, removing stop words, stemming, lemmatizing, etc. In this post, we'll see how we can remove stop words from our input text to clean our data so that our analysis is based only on the actual content of the data. But wait, what are stop
Using Google’s libphonenumber Library to Parse and Validate Phone NumbersTech by Sunny Srinidhi - January 9, 2020January 9, 20200 We all work with phone numbers in almost any project or product which has human users. And when the product is available to a global user base, it becomes very difficult to maintain valid phone numbers in the database. We need to make sure the phone numbers for different regions are of the proper length for their regions, add country codes, or remove them, and a lot of such validations. This could become a project of its own pretty soon. We had such an issue in one of our projects. When I was doing the research to find an easy to use and light weight tool so that I could outsource the smarts involved in this to, I came across the
Connect Apache Spark with MongoDB database using the mongo-spark-connectorData ScienceTech by Sunny Srinidhi - April 3, 2019February 28, 20200 A couple of days back, we saw how we can connect Apache Spark to an Apache HBase database and query the data from a table using a catalog. Today, we’ll see how we can connect Apache Spark to a MongoDB database and get data directly into Spark from there. MongoDB provides us a plugin called the mongo-spark-connector, which will help us connect MongoDB and Spark without any drama at all. We just need to provide the MongoDB connection URI in the SparkConf object, and create a ReadConfig object specifying the collection name. It might sound complicated right now, but once you look at the code, you’ll understand how extremely easy this is. So, let’s look at an example. The Dataset Before we look
Connect Apache Spark to your HBase database (Spark-HBase Connector)Data ScienceTech by Sunny Srinidhi - April 1, 2019January 31, 20202 There will be times when you’ll need the data in your HBase database to be brought into Apache Spark for processing. Usually, you’ll query the database, get the data in whatever format you fancy, and then load that into Spark, maybe using the `parallelize()`function. This works, just fine. But depending on the size of the data, this could cause delays. At least it did for our application. So after some research, we stumbled upon a Spark-HBase connector in Hortonworks repository. Now, what is this connector and why should you be considering this? The Spark-HBase Connector (shc-core) The SHC is a tool provided by Hortonworks to connect your HBase database to Apache Spark so that you can tell your Spark context to pickup the
How to build a simple data lake using Amazon Kinesis Data Firehose and Amazon S3Data Science by Sunny Srinidhi - March 3, 2020March 3, 20203 In this post, we’ll see how we can create a very simple, yet highly scalable data lake using Amazon’s Kinesis Data Firehose and Amazon’s S3.