Athena Archives -

Understanding Partition Projections in AWS Athena

If you are somebody who uses AWS Athena to query large highly partitioned tables on a daily basis you must know how difficult it is to maintain the partitions. As your partitions grow, you also need to update the metadata in Glue Data Catalog, or else the new data isn’t scanned. Some of us even... » read more

Read

Engineering@ZenOfAI written 5 years ago

Building a data lake on AWS using Redshift Spectrum

In one of our earlier posts, we had talked about setting up a data lake using AWS LakeFormation. Once the data lake is setup, we can use Amazon Athena to query data. Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so... » read more

Read

Engineering@ZenOfAI written 5 years ago

Optimizing QuickSight using Athena Queries and SPICE: Operating cost analysis

In this post, I will be discussing as an example how an automobile manufacturing company could utilize QuickSight to analyze their sales data and make better decisions. We will also learn how to best optimize the QuickSight operational cost structure by using SPICE engine to ingest source data at certain recurring intervals from Athena queries. This... » read more

Read

Engineering@ZenOfAI written 6 years ago

AWS Machine Learning Data Engineering Pipeline for Batch Data

This post walks you through all the steps required to build a data engineering pipeline for batch data using AWS Step Functions. The sequence of steps works like so : the ingested data arrives as a CSV file in a S3 based data lake in the landing zone, which automatically triggers a Lambda function to... » read more

Read

Engineering@ZenOfAI written 6 years ago

Advanced Analytics – Presto Functions and Operators Quick Review

This post is a lot different from our earlier entries. Think of it as a reference flag post for people interested in a quick lookup for advanced analytics functions and operators used in modern data lake operations based on Presto. So you could, of course, use it in Presto installations, but also in some other... » read more

Read

Engineering@ZenOfAI written 6 years ago

Email Deliverability Analytics using SendGrid and AWS Big Data Services

In this post, we will run though a case study to setup an email deliverability analytics pipeline using SendGrid and AWS Big Data Services such as S3, Glue and Athena. To start off, when we send mails from SendGrid to recipients. we get responses (multiple response types are possible such as processed, delivered, blocked, deferred... » read more

Read

Engineering@ZenOfAI written 6 years ago