Explore your Amazon S3 data online using Filestash

Tech

by Sunny Srinidhi - April 29, 2020April 29, 20200

Filestash is a very handy tool in your browser which helps you nativage your S3 buckets and folders easily, and even edit files online.

How to build a simple data lake using Amazon Kinesis Data Firehose and Amazon S3

Data Science

by Sunny Srinidhi - March 3, 2020March 3, 20203

In this post, we’ll see how we can create a very simple, yet highly scalable data lake using Amazon’s Kinesis Data Firehose and Amazon’s S3.

Query data from S3 files using Amazon Athena

by Sunny Srinidhi - September 24, 2019March 7, 20201

Amazon Athena is defined as "an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL." So, it's another SQL query engine for large data sets stored in S3. This is very similar to other SQL query engines, such as Apache Drill. But unlike Apache Drill, Athena is limited to data only from Amazon's own S3 storage service. However, Athena is able to query a variety of file formats, including, but not limited to CSV, Parquet, JSON, etc. In this post, we'll see how we can setup a table in Athena using a sample data set stored in S3 as a .csv file. But for this, we first need

Streamline Data Transfer with AWS DataSync: A Comprehensive Guide

Data Science

by Sunny Srinidhi - March 9, 2024March 9, 20240

Discover the power of AWS DataSync for seamless, secure, and accelerated data transfers. Learn how to optimise workflows with ease!

Use Amazon CloudSearch to quickly search through data

Tech

by Sunny Srinidhi - March 29, 2023January 17, 20240

Amazon CloudSearch provides a number of powerful search capabilities, including full-text search, faceted search, and customizable relevance ranking. In this post, we’ll see what CloudSearch is

Cleaning and Normalizing Data Using AWS Glue DataBrew

Data Science

by Sunny Srinidhi - January 17, 2022January 17, 20221

In this post, we’ll see what is AWS Glue DataBrew and how to use it to clean and transform our data in a data pipeline.

Getting Started With Apache Airflow

Data Science

by Sunny Srinidhi - October 11, 2021October 11, 20210

I recently started working with Apache Airflow. And as is tradition, I’m telling you everything about it here.

Kinesis Data Streams vs. Kinesis Firehose Delivery Streams

Data Science

by Sunny Srinidhi - May 25, 2020August 27, 20240

I have talked about Kinesis before, and I'm sure you've been using Kinesis for longer than me. But according to what I've seen, not all teams or companies use all parts of Kinesis. And, there are four parts in Kinesis: Ingest and process streaming data with Kinesis streams - Kinesis Data Streams Deliver streaming data with Kinesis Firehose delivery streams - Kinesis Firehose Delivery Streams Analyse streaming data with Kinesis analytics applications - Kinesis Analytics Ingest and process media streams with Kinesis video streams - Kinesis Video Streams All these four parts offer something different. Well, the last two are definitely different than the first two. But it's the first two that I see a lot of people getting confused with. So I thought I'll

How To Generate Parquet Files in Java

Data Science

by Sunny Srinidhi - April 7, 2020April 7, 202014

The Parquet file format has become very popular lately. In this post, we’ll see what it is, and how to create Parquet files in Java using Spring Boot.

Getting started with Chalice to create AWS Lambdas in Python – Step by Step Tutorial

Tech

by Sunny Srinidhi - November 14, 2019November 14, 20190

Using Chalice, you can write a Lambda function, test it locally, and even deploy the Lambda function to your development, test, or production environments. In this post, we’ll see how we can install Chalice on our local machines, write a simple REST API to return the famous “Hello, world!” response, and deploy it to a dev stage on AWS Lambda.