Glue Archives -

Understanding Partition Projections in AWS Athena

If you are somebody who uses AWS Athena to query large highly partitioned tables on a daily basis you must know how difficult it is to maintain the partitions. As your partitions grow, you also need to update the metadata in Glue Data Catalog, or else the new data isn’t scanned. Some of us even... » read more

Read

Engineering@ZenOfAI written 5 years ago

Machine Learning based Fuzzy Matching using AWS Glue ML Transforms

Machine Learning Transforms in AWS Glue AWS Glue provides machine learning capabilities to create custom transforms to do Machine Learning based fuzzy matching to deduplicate and cleanse your data. For this we are going to use a transform named FindMatches. The FindMatches transform enables you to identify duplicate or matching records in your dataset, even... » read more

Read

Engineering@ZenOfAI written 6 years ago

AWS Machine Learning Data Engineering Pipeline for Batch Data

This post walks you through all the steps required to build a data engineering pipeline for batch data using AWS Step Functions. The sequence of steps works like so : the ingested data arrives as a CSV file in a S3 based data lake in the landing zone, which automatically triggers a Lambda function to... » read more

Read

Engineering@ZenOfAI written 6 years ago

Setting Up a Data Lake on AWS Cloud Using LakeFormation

Setting up a Data Lake involves multiple steps such as collecting, cleansing, moving, and cataloging data, and then securely making that data available for downstream analytics and Machine Learning. AWS LakeFormation simplifies these processes and also automates certain processes like data ingestion. In this post, we shall be learning how to build a very simple... » read more

Read

Engineering@ZenOfAI written 6 years ago

Email Deliverability Analytics using SendGrid and AWS Big Data Services

In this post, we will run though a case study to setup an email deliverability analytics pipeline using SendGrid and AWS Big Data Services such as S3, Glue and Athena. To start off, when we send mails from SendGrid to recipients. we get responses (multiple response types are possible such as processed, delivered, blocked, deferred... » read more

Read

Engineering@ZenOfAI written 6 years ago