When you're starting your machine learning journey, you'll come across null hypothesis and the p-value. At a certain point in your journey, it becomes quite important to know what these mean to make ...
Read MoreTag: bigdata
We have seen methods such as fit(), transform(), and fit_transform() in a lot of SciKit's libraries. And almost all tutorials, including the ones I've written, only tell you to just use one of these ...
Read MoreIn the previous post, we tried to understand the basics of Apache's Kafka Streams. In this post, we'll build on that knowledge and see how Kafka Streams can be used both as streams and tables. St...
Read MoreIf you work with streams of big data which have to be collected, transformed, and analysed, you for sure would have heard of Amazon Kinesis Firehose. It is an AWS service used to load streams of data...
Read MoreIn the last post, we saw how to query data from S3 using Amazon Athena in the AWS Console. But querying from the Console itself if very limited. We can't really do much with the data, and anytime we ...
Read MoreAmazon Athena is defined as "an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL." So, it's another SQL query engi...
Read MoreIf you are in the big data or data science or BI space, you might have heard about Apache Spark. A few of you might have also heard about Apache Drill, and a tiny bit of you might have actually worke...
Read MoreIf you’ve worked with Spark SQL, you might have come across the concept of User Defined Functions (UDFs). As the name suggests, it’s a feature where you define a function, pretty straight forward...
Read More