Data ScienceTech

Query data from S3 files using Amazon Athena

Amazon Athena is defined as "an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL." So, it's another SQL query engine for large data sets stored in S3. This is very similar to other SQL query engines, such as Apache Drill. But unlike Apache Drill, Athena is limited to data only from Amazon's own S3 storage service. However, Athena is able to query a variety of file formats, including, but not limited to CSV, Parquet, JSON, etc. In this post, we'll see how we can setup a table in Athena using a sample data set stored in S3 as a .csv file. But for this, we first need that sample CSV file. You can download it here: sampleDataDownload Once you have the file downloaded, create a new bucket in ...

Read More