Analyse Kafka messages with SQL queries using Apache DrillData ScienceTech by Sunny Srinidhi - September 23, 2019January 13, 20201 In the previous post, we figured out how to connect MongoDB with Apache Drill and query data with SQL queries. In this post, let's extend that knowledge and see how we can use similar SQL queries to analyse our Kafka messages. Configuring the Kafka storage plugin in Apache Drill is quite simple, very similar to how we configured the MongoDB storage plugin. First, we run our local instances of Apache Drill, Apache Zookeeper, and Apache Kafka. After this, head over to http://localhost:8047/storage, where we can enable the Kafka plugin. You should see it in the list to the right of the page. Click the Enable button. The storage plugin will be enabled. After this, we need to add a few configuration
Getting Started with Apache Drill and MongoDBData ScienceTech by Sunny Srinidhi - September 23, 2019February 28, 20203 Not a lot of people have heard of Apache Drill. That is because Drill caters to very specific use cases, it's very niche. But when used, it can make significant differences to the way you interact with data. First, let's see what Apache Drill is, and then how we can connect our MongoDB data source to Drill and easily query data. What is Apache Drill? According to their website, Apache Drill is "Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage." That's pretty much self-explanatory. So, Drill is a tool to query Hadoop, MongoDB, and other NoSQL databases. You can write simple SQL queries that run on the data stored in other databases, and you get the result in a row-column format. The
Integrate AWS DynamoDB with Spring BootTech by Sunny Srinidhi - June 26, 2019March 12, 20200 Here is another POC to add to the growing list of POCs on my Github profile. Today, we’ll see how to integrate AWS DynamoDB with a Spring Boot application. This is going to be super simple, thanks to the AWS Java SDK and the Spring Data DynamoDB package. Let’s get started then. Dependencies First, as usual, we need to create a Spring Boot project, the dependencies of which look like: <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter</artifactId> </dependency> <dependency> <groupId>com.amazonaws</groupId> <artifactId>aws-java-sdk-dynamodb</artifactId> <version>1.11.573</version>
Apache Spark SQL User Defined Function (UDF) POC in JavaData ScienceTech by Sunny Srinidhi - May 14, 2019December 19, 20192 If you’ve worked with Spark SQL, you might have come across the concept of User Defined Functions (UDFs). As the name suggests, it’s a feature where you define a function, pretty straight forward. But how is this different from any other custom function that you write? Well, when you’re working with Spark in a distributed environment, your code is distributed across the cluster. For this to happen, your code entities have to be serializable, including the various functions you call. When you want to manipulate columns in your Dataset, Spark provides a variety of built-in functions. But there are cases when you want a custom implementation to work with your columns. For this, Spark provides UDF. But you should be warned,
Connect Apache Spark with MongoDB database using the mongo-spark-connectorData ScienceTech by Sunny Srinidhi - April 3, 2019February 28, 20200 A couple of days back, we saw how we can connect Apache Spark to an Apache HBase database and query the data from a table using a catalog. Today, we’ll see how we can connect Apache Spark to a MongoDB database and get data directly into Spark from there. MongoDB provides us a plugin called the mongo-spark-connector, which will help us connect MongoDB and Spark without any drama at all. We just need to provide the MongoDB connection URI in the SparkConf object, and create a ReadConfig object specifying the collection name. It might sound complicated right now, but once you look at the code, you’ll understand how extremely easy this is. So, let’s look at an example. The Dataset Before we look
Connect Apache Spark to your HBase database (Spark-HBase Connector)Data ScienceTech by Sunny Srinidhi - April 1, 2019January 31, 20202 There will be times when you’ll need the data in your HBase database to be brought into Apache Spark for processing. Usually, you’ll query the database, get the data in whatever format you fancy, and then load that into Spark, maybe using the `parallelize()`function. This works, just fine. But depending on the size of the data, this could cause delays. At least it did for our application. So after some research, we stumbled upon a Spark-HBase connector in Hortonworks repository. Now, what is this connector and why should you be considering this? The Spark-HBase Connector (shc-core) The SHC is a tool provided by Hortonworks to connect your HBase database to Apache Spark so that you can tell your Spark context to pickup the
How you can improve your backend services’ performance using Apache KafkaTech by Sunny Srinidhi - November 27, 2018February 25, 20201 In most real world applications, we have a RESTful API service facing various client applications and a collection of backend services which process the data coming from those clients. Depending on the application, the architecture might have various services spread across multiple clusters of servers, and some form of queue or messaging service gluing them together. Today, we're going to talk about one such messaging service - Apache Kafka - and how it can improve the performance of your services. We're going to assume that we have at least two microservices, one for the APIs that are exposed to the world, and one which processes the requests coming in from the API microservice, but in an async fashion. Because this is
Why you should switch to Signal or Telegram from WhatsApp, TodayTech by Sunny Srinidhi - November 23, 2018December 19, 20193 When we think of communicating with someone today, we mostly think of sending them a text message or a voice note on WhatsApp. And some other people who are least bothered about their privacy online, think of Facebook Messenger. But not all these users know what's happening with the messages they exchange on these platforms. Let's take a look at that. Before we start, let me admit, I am by no means an expert on security and privacy online. But I have done enough research for the last couple of years, which made me switch to Firefox and DuckDuckGo (with a lot of customized preferences on both), from Google's Chrome browser and search. I've made a lot of other such switches
Simple Apache Kafka Producer and Consumer using Spring BootTech by Sunny Srinidhi - November 23, 2018March 2, 20202 Originally published here: https://medium.com/@contactsunny/simple-apache-kafka-producer-and-consumer-using-spring-boot-41be672f4e2b Before I even start talking about Apache Kafka here, let me answer your question after you read the topic — aren’t there enough posts and guides about this topic already? Yes, there are plenty of reference documents and how-to posts about how to create Kafka producers and consumers in a Spring Boot application. Then why am I writing another post about this? Well, in the future, I’ll be talking about some advanced stuff, in the data science space. Apache Kafka is one of the most used technologies and tools in this space. It kind of becomes important to know how to work with Apache Kafka in a real-world application. So this is an introductory post to the technology, which we’ll be
Keystroke Dynamics, What Is It?Tech by Sunny Srinidhi - November 16, 20180 For decades, we have been using the two-pronged key system for securing our electronic data and services. The two-pronged key we're talking about is the username/password combination. There are variations of this, of course. For example, instead of a username, you might be using your email address, or something called a user ID. But the concept remains the same. The username/password combination for security is over 50 years old. To be more precise, it was first implemented in the year 1961 at Massachusetts Institute of Technology (MIT). We have been using this security method for all kinds of data and services online, including but not limited to emails, banking, and gaming services. But it's also true that it's been proved a lot many