Data ScienceTech

How to Query Athena from a Spring Boot application?

In the last post, we saw how to query data from S3 using Amazon Athena in the AWS Console. But querying from the Console itself if very limited. We can't really do much with the data, and anytime we want to analyse this data, we can't really sit in front of the console the whole day and run queries manually. We need to automate the process. And what better way to do that than writing a piece of code? So in this post, we'll see how we can use the AWS Java SDK in a Spring Boot application and query the same sample data set from the previous post. We'll then log it to the console to make sure we're getting the right data. The Dependencies Before we get to the code, let's first get our dependencies right. I did the painstaking task of finding the right dependencies for this POC. All...

Read More
Data ScienceTech

Query data from S3 files using Amazon Athena

Amazon Athena is defined as "an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL." So, it's another SQL query engine for large data sets stored in S3. This is very similar to other SQL query engines, such as Apache Drill. But unlike Apache Drill, Athena is limited to data only from Amazon's own S3 storage service. However, Athena is able to query a variety of file formats, including, but not limited to CSV, Parquet, JSON, etc. In this post, we'll see how we can setup a table in Athena using a sample data set stored in S3 as a .csv file. But for this, we first need that sample CSV file. You can download it here: sampleDataDownload Once you have the file downloaded, create a new bucket in ...

Read More