Object Detection in React Native App using AWS Rekognition

In this post, we are going to build a React Native app for detecting objects from an image using Amazon Rekognition.

Here we will capture an Image or Select it from file system. We will send that image to API Gateway where it triggers the Lambda Function which will store in S3 Bucket. The stored image is sent to Amazon Recognition which will detect the objects from the image.

Installing dependencies:

Let’s go to React Native Docs, select React Native CLI Quickstart and select our appropriate Development OS and the Target OS as Android, as we are going to build an android application.

Follow the docs for installing dependencies, then create a new React Native Application. Use the command line interface to generate a new React Native project called ObjectDetection.

react-native init ObjectDetection

Preparing the Android device:

We shall need an Android device to run our React Native Android app. This can be either a physical Android device, or more commonly, we can use an Android Virtual Device (AVD) which allows us to emulate an Android device on our computer (using Android Studio).

Either way, we shall need to prepare the device to run Android apps for development. If you have a physical Android device, you can use it for development in place of an AVD by connecting it to your computer using a USB cable and following the instructions here.

If you are using a virtual device follow this link. I shall be using a physical Android device.

Now go to the command line and run react-native run-android inside your React Native app directory

cd ObjectDetection && react-native run-android

If everything is set up correctly, you should see your new app running on your physical device or Android emulator.

API Creation in AWS Console: 

Before going further, create an API in your AWS console following this link.
Once your done with creating API come back to the React Native application.
Now, go to your project directory and Replace your App.js file with the following code.

import React, {Component} from 'react';
import {
 StyleSheet,
 View,
 Text,
 TextInput,
 Image,
 ScrollView,
 TouchableHighlight,
} from 'react-native';
import ImagePicker from 'react-native-image-picker';
import Amplify, {API} from 'aws-amplify';
import Video from 'react-native-video';
 
// Amplify configuration for API-Gateway
Amplify.configure({
 API: {
   endpoints: [
     {
       name: 'LabellingAPI',   //your api name
       endpoint:’<Endpoint-URL>’, //Your Endpoint URL
     },
   ],
 },
});
 
class Registration extends Component {
 constructor(props) {
   super(props);
   this.state = {
     username: 'storeImage.png',
     userId: '',
     image: '',
     capturedImage: '',
     objectName: '',
   };
 }
 
// It selects image from filesystem or capture from camera
 captureImageButtonHandler = () => {
   this.setState({
     objectName: '',
   });
 
   ImagePicker.showImagePicker(
     {title: 'Pick an Image', maxWidth: 800, maxHeight: 600},
     response => {
       console.log('Response = ', response);
       if (response.didCancel) {
         console.log('User cancelled image picker');
       } else if (response.error) {
         console.log('ImagePicker Error: ', response.error);
       } else if (response.customButton) {
         console.log('User tapped custom button: ', response.customButton);
       } else {
         // You can also display the image using data:
         const source = {uri: 'data:image/jpeg;base64,' + response.data};
         this.setState({
           capturedImage: response.uri,
           base64String: source.uri,
         });
       }
     },
   );
 };
 
// this method triggers when you click submit. If the image is valid then It will send the image to API Gateway. 
 submitButtonHandler = () => {
   if (
     this.state.capturedImage == '' ||
     this.state.capturedImage == undefined ||
     this.state.capturedImage == null
   ) {
     alert('Please Capture the Image');
   } else {
     const apiName = 'LabellingAPI';
     const path = '/storeimage';
     const init = {
       headers: {
         Accept: 'application/json',
         'Content-Type': 'application/x-amz-json-1.1',
       },
       body: JSON.stringify({
         Image: this.state.base64String,
         name: 'storeImage.png',
       }),
     };
 
     API.post(apiName, path, init).then(response => {
       if (JSON.stringify(response.Labels.length) > 0) {
         this.setState({
           objectName: response.Labels[0].Name,
         });
       } else {
         alert('Please Try Again.');
       }
     });
   }
 };
 
 render() {
   if (this.state.image !== '') {
   }
   return (
     <View style={styles.MainContainer}>
       <ScrollView>
         <Text
           style={{
             fontSize: 20,
             color: '#000',
             textAlign: 'center',
             marginBottom: 15,
             marginTop: 10,
           }}>
           Capture Image
         </Text>
         {this.state.capturedImage !== '' && (
           <View style={styles.imageholder}>
             <Image
               source={{uri: this.state.capturedImage}}
               style={styles.previewImage}
             />
           </View>
         )}
         {this.state.objectName ? (
           <TextInput
             underlineColorAndroid="transparent"
             style={styles.TextInputStyleClass}
             value={this.state.objectName}
           />
         ) : null}
         <TouchableHighlight
           style={[styles.buttonContainer, styles.captureButton]}
           onPress={this.captureImageButtonHandler}>
           <Text style={styles.buttonText}>Capture Image</Text>
         </TouchableHighlight>
 
         <TouchableHighlight
           style={[styles.buttonContainer, styles.submitButton]}
           onPress={this.submitButtonHandler}>
           <Text style={styles.buttonText}>Submit</Text>
         </TouchableHighlight>
       </ScrollView>
     </View>
   );
 }
}
 
const styles = StyleSheet.create({
 TextInputStyleClass: {
   textAlign: 'center',
   marginBottom: 7,
   height: 40,
   borderWidth: 1,
   marginLeft: 90,
   width: '50%',
   justifyContent: 'center',
   borderColor: '#D0D0D0',
   borderRadius: 5,
 },
 inputContainer: {
   borderBottomColor: '#F5FCFF',
   backgroundColor: '#FFFFFF',
   borderRadius: 30,
   borderBottomWidth: 1,
   width: 300,
   height: 45,
   marginBottom: 20,
   flexDirection: 'row',
   alignItems: 'center',
 },
 buttonContainer: {
   height: 45,
   flexDirection: 'row',
   alignItems: 'center',
   justifyContent: 'center',
   marginBottom: 20,
   width: '80%',
   borderRadius: 30,
   marginTop: 20,
   marginLeft: 5,
 },
 captureButton: {
   backgroundColor: '#337ab7',
   width: 350,
 },
 buttonText: {
   color: 'white',
   fontWeight: 'bold',
 },
 horizontal: {
   flexDirection: 'row',
   justifyContent: 'space-around',
   padding: 10,
 },
 submitButton: {
   backgroundColor: '#C0C0C0',
   width: 350,
   marginTop: 5,
 },
 imageholder: {
   borderWidth: 1,
   borderColor: 'grey',
   backgroundColor: '#eee',
   width: '50%',
   height: 150,
   marginTop: 10,
   marginLeft: 90,
   flexDirection: 'row',
   alignItems: 'center',
 },
 previewImage: {
   width: '100%',
   height: '100%',
 },
});
 
export default Registration;

In the above code, we are configuring amplify with the API name and Endpoint URL that you created as shown below.

Amplify.configure({
 API: {
   endpoints: [
     {
       name: '<Your-API-Name>, 
       endpoint:’<Endpoint-URL>’,
     },
   ],
 },
});

By clicking the capture button it will trigger the captureImageButtonHandler function. It will then ask the user to take a picture or select from file system. When user captures the image or selects from file system, we will store that image in the state as shown below.

captureImageButtonHandler = () => {
   this.setState({
     objectName: '',
   });
 
   ImagePicker.showImagePicker(
     {title: 'Pick an Image', maxWidth: 800, maxHeight: 600},
     response => {
       console.log('Response = ', response);
       if (response.didCancel) {
         console.log('User cancelled image picker');
       } else if (response.error) {
         console.log('ImagePicker Error: ', response.error);
       } else if (response.customButton) {
         console.log('User tapped custom button: ', response.customButton);
       } else {
         // You can also display the image using data:
         const source = {uri: 'data:image/jpeg;base64,' + response.data};
         this.setState({
           capturedImage: response.uri,
           base64String: source.uri,
         });
       }
     },
   );
 };

After capturing the image we will preview that image. By Clicking on submit button, submitButtonHandler function will get triggered where we will send the image to the end point as shown below.

submitButtonHandler = () => {
   if (
     this.state.capturedImage == '' ||
     this.state.capturedImage == undefined ||
     this.state.capturedImage == null
   ) {
     alert('Please Capture the Image');
   } else {
     const apiName = 'LabellingAPI';
     const path = '/storeimage';
     const init = {
       headers: {
         Accept: 'application/json',
         'Content-Type': 'application/x-amz-json-1.1',
       },
       body: JSON.stringify({
         Image: this.state.base64String,
         name: 'storeImage.png',
       }),
     };
 
     API.post(apiName, path, init).then(response => {
       if (JSON.stringify(response.Labels.length) > 0) {
         this.setState({
           objectName: response.Labels[0].Name,
         });
       } else {
         alert('Please Try Again.');
       }
     });
   }
 };

Lambda Function:

Add the following code into your lambda function that you created in your AWS Console.

const AWS = require('aws-sdk')
var rekognition = new AWS.Rekognition()
var s3Bucket = new AWS.S3( { params: {Bucket: "<Your-Bucket>"} } );
var fs = require('fs');
exports.handler = (event, context, callback) => {
   let parsedData = JSON.parse(event)
   let encodedImage = parsedData.Image;
   var filePath = parsedData.name;
   let buf = new Buffer(encodedImage.replace(/^data:image\/\w+;base64,/, ""),'base64')
   var data = {
       Key: filePath,
       Body: buf,
       ContentEncoding: 'base64',
       ContentType: 'image/jpeg'
   };
   s3Bucket.putObject(data, function(err, data){
       if (err) {
           console.log('Error uploading data: ', data);
           callback(err, null);
       } else {
           var params = {
             Image: {
              S3Object: {
               Bucket: "<Your-Bucket>",
               Name: filePath
              }
             },
             MaxLabels: 10,
             MinConfidence: 90
            };
           rekognition.detectLabels(params, function(err, data) {
               if (err){
                   console.log(err, err.stack);
                   callback(err)
               }
               else{
                   console.log(data);
                   callback(null, data);
               }
           });
       }
   });
};

In the above code, we would receive the image from React Native which we are storing in S3 Bucket. The stored image is sent to Amazon Recognition which has detectLabels method that detects the labels from the image and sends the response with the detected labels in JSON format.

capture image screen

Once you capture an image you can see a preview of that image as shown below.

Nike backpack

On submitting the captured image you can see the label of that image as shown below:

Object recognised as backpack

That’s all folks! I hope it was helpful.
For any queries drop them in the comments section.

This story is authored by Dheeraj Kumar and Venu Vaka. Dheeraj is a software engineer specializing in React Native and React based frontend development. Venu is a software engineer specializing in ReactJS and AWS Cloud.

Optimizing QuickSight using Athena Queries and SPICE: Operating cost analysis

In this post, I will be discussing as an example how an automobile manufacturing company could utilize QuickSight to analyze their sales data and make better decisions. We will also learn how to best optimize the QuickSight operational cost structure by using SPICE engine to ingest source data at certain recurring intervals from Athena queries. This has two major advantages : dashboards and analyses load quickly as the data source is within SPICE. Secondly, cost of data ingestion is also brought down as Athena is queried only to refresh the data load in SPICE.

We will look at a sales dashboard, created using data-sets prepared from data in refined zone in a DataLake created using LakeFormation. A Data engineering pipeline writes data to this refined zone with year and month partitions every hour.

In case you wish to build a similar thing and follow along, below is the link to raw datasets:

https://github.com/koushik-bitzop/data-sets/tree/master/sales2016-2018

Creating a SPICE based Athena Data-set:

Select Athena as the data set source:

Select use custom query.

Select Edit/Preview data and then choose data source as SPICE and click on Finish.

Once query successfully ran and you could see the data, click on the Save and Visualise.

In case you want to add any calculated fields or change data types you could do that in the red highlighted section shown above.

I have discussed in detail here in my previous articles Visualizing Multiple Datasets in AWS QuickSight and Adding User-Interactivity to AWS QuickSight Dashboards

Refresh Schedule for Data-sets:

Depending on how frequent new data is arrived you could schedule the refresh. For every refresh an Athena query is executed and the results are imported into SPICE.

Note: 

  1. In this example, Quicksight SPICE pull data refresh is whole data, not incremental.
  2. It is not possible to pass quicksight pass pushdown predicates (variables) from filters in dashboard to Athena. So if you want to look at a rolling window of data such as past 24 hours or past one month or past 6 months, we can use a WHERE clause in the Athena source query to fetch just those records. Also, if the data is partitioned by year and month, only required data is scanned thereby further saving on costs.

A lowdown on QuickSight Operating cost with this architecture:

We are looking at two main cost components:

  1. Athena – S3 data scan costs
  2. QuickSight Infrastructure costs

Athena – S3 data scan costs:

Athena pricing for successful queries:
1TB scan = 5$
S3 storage cost not included.

No. of queriesData scanned in S3Scheduled RefreshTotal Data scanned
(monthly)
Bill estimated
(monthly)
Bill estimated
(annual)
1150 to 210 KBHourly1*24*30*210KB = 0.0001512TB0.000756$0.009072$

Above numbers are a bit low to make an inference. Let us say, you have 4 such queries (each query is scanning around 150 to 200 MB) powering the dashboard and SPICE ingests this data once every hour.

No. of queriesData scanned in S3Scheduled RefreshTotal Data scanned
(monthly)
Bill estimated
(monthly)
Bill estimated
(annual)
4150 to 200 MBHourly4*24*30*300MB = 576GB or 0.57TB$2.88$34.56

In case, we do not use SPICE to load this data from Athena in an hourly fashion and instead use Athena query as the direct source, then cost of the dashboards would increase proportionately with each query. So as an example, if the dashboards are being viewed at a rate of 1000 views per hour (and each dashboard has 4 source queries), then the cost above would be multiplied by a staggering 1000 times! and the annual bill would be an eye popping $ 34,560.

QuickSight infrastructure cost (Standard Edition):

No charge for readers. $9 for Author with annual subscription.

User typeNo. of usersBill estimated
(monthly)
Bill estimated
(annual)
Author1$9$108
Reader3$0$0
Total
$9 pm$108 pa

Note: For Enterprise edition, Readers are billed $0.30 for a 30-minute session up to a maximum charge of $5/reader/month for unlimited use. Authors are billed $18 with annual subscription.
For SPICE additional capacity $0.25/GB/standard and $0.38/GB/enterprise. 

So overall we can see that using SPICE with a periodic data refresh causes the costs to be optimized in a smart way. That’s it folks. I hope it was helpful. For any queries, drop them in the comments section.

This story is authored by Koushik. Koushik is a software engineer and a keen data science and machine learning enthusiast.

AWS QuickSight Auto-Narratives to Highlight Insights using Natural Language Processing

Most often analyzing data sets to summarize their main characteristics, is done with visuals. Yet still one has to sift through the visuals, drilling down, comparing values, and rechecking ideas to extract a conclusion. But with QuickSight that is not the case, using its auto-narratives feature, one could extrapolate conclusion from the data analysis or highlight insights and state them plainly in a natural language as part of the analysis or report. However in day to day analysis, a balanced mix of plain statements and visuals is appreciated. One could use this feature to add a brief summary of the analysis or highlight important points.

In this blog post, I will be using Discovering Barcelona dashboard, created earlier for my previous articles Visualizing Multiple Datasets in AWS QuickSight and Adding User-Interactivity to AWS QuickSight Dashboards. We will look at how to add insights to QuickSight Dashboards, and use auto-narratives to give a brief about Accidents in Barcelona.

Let us have a look at what we are gonna build.

In the below picture, the green highlighted section is the Insights auto generated from the dataset by QuickSight. If you like these insights and want them as part of analysis, you could add them. This is shown in the red highlight.

Once an Insight is added to an analysis the content it holds is called a Narrative.

Adding a Custom Insight:

Let’s learn more about computations later, for now closing this Computation modal will add an empty insight to our analysis.

I also deleted the previous insight, so I could start from scratch with this new one. To customize the insight either click on Customize insight or from the drop down menu on the top right and choose Customize narrative. Make sure to add the fields required for the insight from the fields list. Select the insight visual and select the fields from Fields list. Once added you could see them in the Field wells bar highlighted in red at the top.

Computations are more like ready made templates, values coming from calculations done on the dataset, here’s a list of computations for you to explore. Parameters could also be used in the narrative logic. I have discussed what parameters are and how to use them here. Functions are the same as those we use to add calculated fields while editing data sets. Add computation.

The type of computation needed is chosen.

Once you apply the configuration, changes will be reflected in the analysis. Let us also add the Top ranked computations for month and district.

Once you add, they will be listed in the Computations section.

In the Computations section, the blue objects are variables that can be used in the narrative.

Now applying configurations would reflect in the analysis.

Similarly let’s add for the District also. First add the computation for the district and then configure the narrative.

Brief summary of the analysis using QuickSight Autonarratives

We successfully configured a custom narrative. 

One more cool thing about it is, filters linked with a control for a specific field can be added. I have a filter created earlier that applies to all visuals. Let us remove one district from using the control and see if it affects the Insight.

It affected, now you don’t see the Example district from both Visual, Insight, and also the stats have also changed!

That’s broadly about AWS QuickSight auto-narratives, I hope this was helpful. Please experiment, and do let me know if I missed something in the comment section.

This story is authored by Koushik. He is a software engineer and a keen data science and machine learning enthusiast.

AWS Machine Learning Data Engineering Pipeline for Batch Data

This post walks you through all the steps required to build a data engineering pipeline for batch data using AWS Step Functions. The sequence of steps works like so : the ingested data arrives as a CSV file in a S3 based data lake in the landing zone, which automatically triggers a Lambda function to invoke the Step Function. I have assumed that data is being ingested daily in a .csv file with a filename_date.csv naming convention like so customers_20190821.csv. The step function, as the first step, starts a landing to raw zone file transfer operation via a Lambda Function. Then we have an AWS Glue crawler crawl the raw data into an Athena table, which is used as a source for AWS Glue based PySpark transformation script. The transformed data is written in the refined zone in the parquet format. Again an AWS Glue crawler runs to “reflect” this refined data into another Athena table. Finally, the data science team can consume this refined data available in the Athena table, using an AWS Sagemaker based Jupyter notebook instance. It is to be noted that the data science does not need to do any data pull manually, as the data engineering pipeline automatically pulls in the delta data, as per the data refresh schedule that writes new data in the landing zone.

Let’s go through the steps

How to make daily data available to Amazon SageMaker?

What is Amazon SageMaker?

Amazon SageMaker is an end-to-end machine learning (ML) platform that can be leveraged to build, train, and deploy machine learning models in AWS. Using the Amazon SageMaker Notebook module, improves the efficiency of interacting with the data without the latency of bringing it locally.
For deep dive into Amazon SageMaker, please go through the official docs.

In this blog post, I will be using a dummy customers data. The customers data consists of retailer information and units purchased.

Updating Table Definitions with AWS Glue

The data catalog feature of AWS Glue and the inbuilt integration to Amazon S3 simplifies the process of identifying data and deriving the schema definition out of the source data. Glue crawlers within Data catalog, are used to build out the metadata tables of data stored in Amazon S3.

I created a crawler named raw for the data in raw zone (s3://bucketname/data/raw/customers/). In case you are just starting out on AWS Glue crawler, I have explained how to create one from scratch in one of my earlier article. If you run this crawler, it creates customers table in specified database (raw).

Create an invocation Lambda Function

In case you are just starting out on Lambda functions, I have explained how to create one from scratch with an IAM role to access the StepFunctions, Amazon S3, Lambda and CloudWatch in my earlier article.

Add trigger to the created Lambda function named invoke-step-functions. Configure Bucket, Prefix and  Suffix accordingly.

Once file is arrived at landing zone, it triggers the invoke Lambda function which extracts year, month, day from file name that comes from event. It passes year, month, day with two characters from uuid as input to the AWS StepFunctions.Please replace the following code in invoke-step-function Lambda.

import json
import uuid
import boto3
from datetime import datetime

sfn_client = boto3.client('stepfunctions')

stm_arn = 'arn:aws:states:us-west-2:XXXXXXXXXXXX:stateMachine:Datapipeline-for-SageMaker'

def lambda_handler(event, context):
    
    # Extract bucket name and file path from event
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    path = event['Records'][0]['s3']['object']['key']
    
    file_name_date = path.split('/')[2]
    processing_date_str = file_name_date.split('_')[1].replace('.csv', '')
    processing_date = datetime.strptime(processing_date_str, '%Y%m%d')
    
    # Extract year, month, day from date
    year = processing_date.strftime('%Y')
    month = processing_date.strftime('%m')
    day = processing_date.strftime('%d')
    
    uuid_temp = uuid.uuid4().hex[:2]
    execution_name = '{processing_date_str}-{uuid_temp}'.format(processing_date_str=processing_date_str, uuid_temp=uuid_temp)
    
    # Starts the execution of AWS StepFunctions
    response = sfn_client.start_execution(
          stateMachineArn = stm_arn,
          name= str(execution_name),
          input= json.dumps({"year": year, "month": month, "day": day})
      )
    
    return {"year": year, "month": month, "day": day}

Create a Generic FileTransfer Lambda

Create a Lambda function named generic-file-transfer as we created earlier in this article. In the file transfer Lambda function, it transfers files from landing zone to raw zone and landing zone to archive zone based on event coming from the StepFunction.

  1. If step is landing-to-raw-file-transfer, the Lambda function copies files from landing to raw zone.
  2. If step is landing-to-archive-file-transfer, the Lambda function copies files from landing to archive zone and deletes files from landing zone.

Please replace the following code in generic-file-transfer Lambda.

import json
import boto3

s3 = boto3.resource('s3')

def lambda_handler(event, context):
    
    # Extract Parameters from Event (invoked by StepFunctions)
    step = event['step']
    year = event['year']
    month = event['month']
    day = event['day']
    
    bucket_name = event['bucket_name']
    source_prefix = event['source_prefix']
    destination_prefix = event['destination_prefix']
    
    bucket = s3.Bucket(bucket_name)
    
    for objects in bucket.objects.filter(Prefix = source_prefix):
        file_path = objects.key
        
        if ('.csv' in file_path) and (step == 'landing-to-raw-file-transfer'):
            
            # Extract filename from file_path
            file_name_date = file_path.split('/')[2]
            file_name = file_name_date.split('_')[0]
            
            # Add filename to the destination prefix
            destination_prefix = '{destination_prefix}{file_name}/year={year}/month={month}/day={day}/'.format(destination_prefix=destination_prefix, file_name=file_name, year=year, month=month, day=day)
            print(destination_prefix)
            
            source_object = {'Bucket': bucket_name, "Key": file_path}
            
            # Replace source prefix with destination prefix
            new_path = file_path.replace(source_prefix, destination_prefix)
            
            # Copies file
            new_object = bucket.Object(new_path)
            new_object.copy(source_object)
         
        if ('.csv' in file_path) and (step == 'landing-to-archive-file-transfer'):
            
            # Add filename to the destination prefix
            destination_prefix = '{destination_prefix}{year}-{month}-{day}/'.format(destination_prefix=destination_prefix, year=year, month=month, day=day)
            print(destination_prefix)
            
            source_object = {'Bucket': bucket_name, "Key": file_path}
            
            # Replace source prefix with destination prefix
            new_path = file_path.replace(source_prefix, destination_prefix)
            
            # Copies file
            new_object = bucket.Object(new_path)
            new_object.copy(source_object)
            
            # Deletes copied file
            bucket.objects.filter(Prefix = file_path).delete()
            
    return {"year": year, "month": month, "day": day}

Generic FileTransfer Lambda function setup is now complete. We need to check all files are copied successfully from one zone to another zone. If you have large files that needs to be copied, you could check out our Lightening fast distributed file transfer architecture.

Create Generic FileTransfer Status Check Lambda Function

Create a Lambda function named generic-file-transfer-status. If the step is landing to raw file transfer, the Lambda function checks if all files are copied from landing to raw zone by comparing the number of objects in landing and raw zones. If count doesn’t match it will raise an exception, and that exception is handled in AWS StepFunctions and retries after some backoff rate. If the count matches, all files are copied successfully. If the step is landing to archive file transfer, the Lambda function checks that any files are left in landing zone. Please replace the following code in generic-file-transfer-status Lambda function.

import json
import boto3

s3 = boto3.resource('s3')

def lambda_handler(event, context):
    
    # Extract Parameters from Event (invoked by StepFunctions)
    step = event['step']
    year = event['year']
    month = event['month']
    day = event['day']
    
    bucket_name = event['bucket_name']
    source_prefix = event['source_prefix']
    destination_prefix = event['destination_prefix']
    
    bucket = s3.Bucket(bucket_name)
    
    class LandingToRawFileTransferIncompleteException(Exception):
        pass

    class LandingToArchiveFileTransferIncompleteException(Exception):
        pass
    
    if (step == 'landing-to-raw-file-transfer'):
        if file_transfer_status(bucket, source_prefix, destination_prefix):
            print('File Transfer from Landing to Raw Completed Successfully')
        else:
            raise LandingToRawFileTransferIncompleteException('File Transfer from Landing to Raw not completed')
    
    if (step == 'landing-to-archive-file-transfer'):
        if is_empty(bucket, source_prefix):
            print('File Transfer from Landing to Archive Completed Successfully')
        else:
            raise LandingToArchiveFileTransferIncompleteException('File Transfer from Landing to Archive not completed.')
    
    return {"year": year, "month": month, "day": day}

def file_transfer_status(bucket, source_prefix, destination_prefix):
    
    try:
        
        # Checks number of objects at the source prefix (count of objects at source i.e., landing zone)
        source_object_count = 0
        for obj in bucket.objects.filter(Prefix = source_prefix):
            path = obj.key
            if (".csv" in path):
                source_object_count = source_object_count + 1
        print(source_object_count)
        
        # Checks number of objects at the destination prefix (count of objects at destination i.e., raw zone)
        destination_object_count = 0
        for obj in bucket.objects.filter(Prefix = destination_prefix):
            path = obj.key
            
            if (".csv" in path):
                destination_object_count = destination_object_count + 1
        
        print(destination_object_count)
        return (source_object_count == destination_object_count)

    except Exception as e:
        print(e)
        raise e

def is_empty(bucket, prefix):
    
    try:
        # Checks if any files left in the prefix (i.e., files in landing zone)
        object_count = 0
        for obj in bucket.objects.filter(Prefix = prefix):
            path = obj.key

            if ('.csv' in path):
                object_count = object_count + 1
                    
        print(object_count)
        return (object_count == 0)
        
    except Exception as e:
        print(e)
        raise e

Create a Generic Crawler invocation Lamda

Create a Lambda function named generic-crawler-invoke. The Lambda function invokes a crawler. The crawler name is passed as argument from AWS StepFunctions through event object. Please replace the following code in generic-crawler-invoke Lambda function.

import json
import boto3

glue_client = boto3.client('glue')

def lambda_handler(event, context):
    
    # Extract Parameters from Event (invoked by StepFunctions)
    year = event['year']
    month = event['month']
    day = event['day']
    
    crawler_name = event['crawler_name']
    
    try:
        response = glue_client.start_crawler(Name = crawler_name)
    except Exception as e:
        print('Crawler in progress', e)
        raise e
    
    return {"year": year, "month": month, "day": day}

Create a Generic Crawler Status Lambda

Create a Lambda function named generic-crawler-status. The Lambda function checks whether the crawler ran successfully or not. If crawler is in running state, the Lambda function raises an exception and the exception will be handled in the Step Function and retries after a certain backoff rate. Please replace the following code in generic-crawler-status Lambda.

import json
import boto3

glue_client = boto3.client('glue')

def lambda_handler(event, context):
    
    class CrawlerInProgressException(Exception):
        pass
    
    # Extract Parametres from Event (invoked by StepFunctions)
    year = event['year']
    month = event['month']
    day = event['day']
    
    crawler_name = event['crawler_name']
    
    response = glue_client.get_crawler_metrics(CrawlerNameList =[crawler_name])
    print(response['CrawlerMetricsList'][0]['CrawlerName']) 
    print(response['CrawlerMetricsList'][0]['TimeLeftSeconds']) 
    print(response['CrawlerMetricsList'][0]['StillEstimating']) 
    
    if (response['CrawlerMetricsList'][0]['StillEstimating']):
        raise CrawlerInProgressException('Crawler In Progress!')
    elif (response['CrawlerMetricsList'][0]['TimeLeftSeconds'] > 0):
        raise CrawlerInProgressException('Crawler In Progress!')
    
    return {"year": year, "month": month, "day": day}

Create an AWS Glue Job

AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores. For deep dive into AWS Glue, please go through the official docs.

Create an AWS Glue Job named raw-refined. In case you are just starting out on AWS Glue Jobs, I have explained how to create one from scratch in my earlier article. This Glue job converts file format from csv to parquet and stores in refined zone. The push down predicate is used as filter condition for reading data of only the processing date using the partitions.

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
# args = getResolvedOptions(sys.argv, ['JOB_NAME'])

args = getResolvedOptions(sys.argv, ['JOB_NAME', 'year', 'month', 'day'])

year = args['year']
month = args['month']
day = args['day']

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "raw", table_name = "customers", push_down_predicate ="((year == " + year + ") and (month == " + month + ") and (day == " + day + "))", transformation_ctx = "datasource0")

applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("email_id", "string", "email_id", "string"), ("retailer_name", "string", "retailer_name", "string"), ("units_purchased", "long", "units_purchased", "long"), ("purchase_date", "string", "purchase_date", "string"), ("sale_id", "string", "sale_id", "string"), ("year", "string", "year", "string"), ("month", "string", "month", "string"), ("day", "string", "day", "string")], transformation_ctx = "applymapping1")

resolvechoice2 = ResolveChoice.apply(frame = applymapping1, choice = "make_struct", transformation_ctx = "resolvechoice2")

dropnullfields3 = DropNullFields.apply(frame = resolvechoice2, transformation_ctx = "dropnullfields3")

datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "s3://bucketname/data/refined/customers/", "partitionKeys": ["year","month","day"]}, format = "parquet", transformation_ctx = "datasink4")

job.commit()

Create a Refined Crawler as we created Raw Crawler earlier in this article. Please point the crawler path to refined zone(s3://bucketname/data/refined/customers/) and database as refined. No need to create a Lambda function for refined crawler invocation and status, as we will pass crawler names from the StepFunction.

Resources required to create an the StepFunction have been created.

Creating the AWS StepFunction

StepFunction is where we create and orchestrate steps to process data according to our workflow. Create an AWS StepFunctions named Datapipeline-for-SageMaker.  In case you are just starting out on AWS StepFunctions, I have explained how to create one from scratch here.

Data is being ingested into landing zone. It triggers a Lambda function which in turn invokes the execution of the StepFunction. The steps in the StepFunction are as follows:

  1. Transfers files from landing zone to raw zone.
  2. Checks all files are copied to raw zone successfully or not.
  3. Invokes raw Crawler which crawls data in raw zone and updates/creates definition of table in the specified database.
  4. Checks if the Crawler is completed successfully or not.
  5. Invokes Glue Job and waits for it to complete.
  6. Invokes refined Crawler which crawls data from refined zone in and updates/creates definition of table in the specified database.
  7. Checks if the Crawler is completed successfully or not.
  8. Transfers files from landing zone to archive zone and deletes files from landing zone.
  9. Checks all files are copied and deleted from landing zone successfully.

Please update the StepFunctions definition with the following code.

{
  "Comment": "Datapipeline For MachineLearning in AWS Sagemaker",
  "StartAt": "LandingToRawFileTransfer",
  "States": {
    "LandingToRawFileTransfer": {
      "Comment": "Transfers files from landing zone to Raw zone.",
      "Type": "Task",
      "Parameters": {
        "step": "landing-to-raw-file-transfer",
        "bucket_name": "bucketname",
        "source_prefix": "data/landing/",
        "destination_prefix": "data/raw/",
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:generic-file-transfer",
      "TimeoutSeconds": 4500,
      "Catch": [
        {
          "ErrorEquals": [
            "States.TaskFailed"
          ],
          "Next": "LandingToRawFileTransferFailed"
        },
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "LandingToRawFileTransferFailed"
        }
      ],
      "Next": "LandingToRawFileTransferPassed"
    },
    "LandingToRawFileTransferFailed": {
      "Type": "Fail",
      "Cause": "Landing To Raw File Transfer failed"
    },
    "LandingToRawFileTransferPassed": {
      "Type": "Pass",
      "ResultPath": "$",
      "Parameters": {
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Next": "LandingToRawFileTransferStatus"
    },
    "LandingToRawFileTransferStatus": {
      "Comment": "Checks whether all files are copied from landing to raw zone successfully.",
      "Type": "Task",
      "Parameters": {
        "step": "landing-to-raw-file-transfer",
        "bucket_name": "bucketname",
        "source_prefix": "data/landing/",
        "destination_prefix": "data/raw/",
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:generic-file-transfer-status",
      "Retry": [
        {
          "ErrorEquals": [
            "LandingToRawFileTransferInCompleteException"
          ],
          "IntervalSeconds": 30,
          "BackoffRate": 2,
          "MaxAttempts": 5
        },
        {
          "ErrorEquals": [
            "States.All"
          ],
          "IntervalSeconds": 30,
          "BackoffRate": 2,
          "MaxAttempts": 5
        }
      ],
      "Catch": [
        {
          "ErrorEquals": [
            "States.TaskFailed"
          ],
          "Next": "LandingToRawFileTransferStatusFailed"
        },
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "LandingToRawFileTransferStatusFailed"
        }
      ],
      "Next": "LandingToRawFileTransferStatusPassed"
    },
    "LandingToRawFileTransferStatusFailed": {
      "Type": "Fail",
      "Cause": "Landing To Raw File Transfer failed"
    },
    "LandingToRawFileTransferStatusPassed": {
      "Type": "Pass",
      "ResultPath": "$",
      "Parameters": {
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Next": "StartRawCrawler"
    },
    "StartRawCrawler": {
      "Comment": "Crawls data from raw zone and adds table definition to the specified Database. IF table definition exists updates the definition.",
      "Type": "Task",
      "Parameters": {
        "crawler_name": "raw",
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:generic-crawler-invoke",
      "TimeoutSeconds": 4500,
      "Catch": [
        {
          "ErrorEquals": [
            "States.TaskFailed"
          ],
          "Next": "StartRawCrawlerFailed"
        },
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "StartRawCrawlerFailed"
        }
      ],
      "Next": "StartRawCrawlerPassed"
    },
    "StartRawCrawlerFailed": {
      "Type": "Fail",
      "Cause": "Crawler invocation failed"
    },
    "StartRawCrawlerPassed": {
      "Type": "Pass",
      "ResultPath": "$",
      "Parameters": {
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Next": "RawCrawlerStatus"
    },
    "RawCrawlerStatus": {
      "Comment": "Checks whether crawler is successfully completed.",
      "Type": "Task",
      "Parameters": {
        "crawler_name": "raw",
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:generic-crawler-status",
      "Retry": [
        {
          "ErrorEquals": [
            "CrawlerInProgressException"
          ],
          "IntervalSeconds": 30,
          "BackoffRate": 2,
          "MaxAttempts": 5
        },
        {
          "ErrorEquals": [
            "States.All"
          ],
          "IntervalSeconds": 30,
          "BackoffRate": 2,
          "MaxAttempts": 5
        }
      ],
      "Catch": [
        {
          "ErrorEquals": [
            "States.TaskFailed"
          ],
          "Next": "RawCrawlerStatusFailed"
        },
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "RawCrawlerStatusFailed"
        }
      ],
      "Next": "RawCrawlerStatusPassed"
    },
    "RawCrawlerStatusFailed": {
      "Type": "Fail",
      "Cause": "Crawler invocation failed"
    },
    "RawCrawlerStatusPassed": {
      "Type": "Pass",
      "ResultPath": "$",
      "Parameters": {
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Next": "GlueJob"
    },
    "GlueJob": {
      "Comment": "Invokes Glue job and waits for Glue job to complete.",
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startJobRun.sync",
      "Parameters": {
        "JobName": "retail-raw-refined",
        "Arguments": {
          "--refined_prefix": "data/refined",
          "--year.$": "$.year",
          "--month.$": "$.month",
          "--day.$": "$.day"
        }
      },
      "Catch": [
        {
          "ErrorEquals": [
            "States.TaskFailed"
          ],
          "Next": "GlueJobFailed"
        },
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "GlueJobFailed"
        }
      ],
      "Next": "GlueJobPassed"
    },
    "GlueJobFailed": {
      "Type": "Fail",
      "Cause": "Crawler invocation failed"
    },
    "GlueJobPassed": {
      "Type": "Pass",
      "ResultPath": "$",
      "Parameters": {
        "year.$": "$.Arguments.--year",
        "month.$": "$.Arguments.--month",
        "day.$": "$.Arguments.--day"
      },
      "Next": "StartRefinedCrawler"
    },
    "StartRefinedCrawler": {
      "Comment": "Crawls data from refined zone and adds table definition to the specified Database.",
      "Type": "Task",
      "Parameters": {
        "crawler_name": "refined",
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:generic-crawler-invoke",
      "TimeoutSeconds": 4500,
      "Catch": [
        {
          "ErrorEquals": [
            "States.TaskFailed"
          ],
          "Next": "StartRefinedCrawlerFailed"
        },
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "StartRefinedCrawlerFailed"
        }
      ],
      "Next": "StartRefinedCrawlerPassed"
    },
    "StartRefinedCrawlerFailed": {
      "Type": "Fail",
      "Cause": "Crawler invocation failed"
    },
    "StartRefinedCrawlerPassed": {
      "Type": "Pass",
      "ResultPath": "$",
      "Parameters": {
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Next": "RefinedCrawlerStatus"
    },
    "RefinedCrawlerStatus": {
      "Comment": "Checks whether crawler is successfully completed.",
      "Type": "Task",
      "Parameters": {
        "crawler_name": "refined",
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:generic-crawler-status",
      "Retry": [
        {
          "ErrorEquals": [
            "CrawlerInProgressException"
          ],
          "IntervalSeconds": 30,
          "BackoffRate": 2,
          "MaxAttempts": 5
        },
        {
          "ErrorEquals": [
            "States.All"
          ],
          "IntervalSeconds": 30,
          "BackoffRate": 2,
          "MaxAttempts": 5
        }
      ],
      "Catch": [
        {
          "ErrorEquals": [
            "States.TaskFailed"
          ],
          "Next": "RefinedCrawlerStatusFailed"
        },
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "RefinedCrawlerStatusFailed"
        }
      ],
      "Next": "RefinedCrawlerStatusPassed"
    },
    "RefinedCrawlerStatusFailed": {
      "Type": "Fail",
      "Cause": "Crawler invocation failed"
    },
    "RefinedCrawlerStatusPassed": {
      "Type": "Pass",
      "ResultPath": "$",
      "Parameters": {
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Next": "LandingToArchiveFileTransfer"
    },
    "LandingToArchiveFileTransfer": {
      "Comment": "Transfers files from landing zone to archived zone",
      "Type": "Task",
      "Parameters": {
        "step": "landing-to-archive-file-transfer",
        "bucket_name": "bucketname",
        "source_prefix": "data/landing/",
        "destination_prefix": "data/raw/",
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:generic-file-transfer",
      "TimeoutSeconds": 4500,
      "Catch": [
        {
          "ErrorEquals": [
            "States.TaskFailed"
          ],
          "Next": "LandingToArchiveFileTransferFailed"
        },
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "LandingToArchiveFileTransferFailed"
        }
      ],
      "Next": "LandingToArchiveFileTransferPassed"
    },
    "LandingToArchiveFileTransferFailed": {
      "Type": "Fail",
      "Cause": "Crawler invocation failed"
    },
    "LandingToArchiveFileTransferPassed": {
      "Type": "Pass",
      "ResultPath": "$",
      "Parameters": {
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Next": "LandingToArchiveFileTransferStatus"
    },
    "LandingToArchiveFileTransferStatus": {
      "Comment": "Checks whether all files are copied from landing to archived successfully.",
      "Type": "Task",
      "Parameters": {
        "step": "landing-to-archive-file-transfer",
        "bucket_name": "bucketname",
        "source_prefix": "data/landing/",
        "destination_prefix": "data/raw/",
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:generic-file-transfer-status",
      "Retry": [
        {
          "ErrorEquals": [
            "LandingToArchiveFileTransferInCompleteException"
          ],
          "IntervalSeconds": 30,
          "BackoffRate": 2,
          "MaxAttempts": 5
        },
        {
          "ErrorEquals": [
            "States.All"
          ],
          "IntervalSeconds": 30,
          "BackoffRate": 2,
          "MaxAttempts": 5
        }
      ],
      "Catch": [
        {
          "ErrorEquals": [
            "States.TaskFailed"
          ],
          "Next": "LandingToArchiveFileTransferStatusFailed"
        },
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "LandingToArchiveFileTransferStatusFailed"
        }
      ],
      "Next": "LandingToArchiveFileTransferStatusPassed"
    },
    "LandingToArchiveFileTransferStatusFailed": {
      "Type": "Fail",
      "Cause": "LandingToArchiveFileTransfer invocation failed"
    },
    "LandingToArchiveFileTransferStatusPassed": {
      "Type": "Pass",
      "ResultPath": "$",
      "Parameters": {
        "year.$": "$.year",
        "month.$": "$.month",
        "day.$": "$.day"
      },
      "End": true
    }
  }
}

After updating the AWS StepFunctions definition, the visual workflow looks like the following.

Now upload file in data/landing/ zone in the bucket  where the trigger has been configured with the Lambda. The execution of StepFunction has started and the visual workflow looks like the following.

In RawCrawlerStatus step, if the Lambda is failing we retry till sometime and then mark the StepFunction as failed. If the StepFunction ran successfully. The visual workflow of the StepFunction looks like following.

Machine Learning workflow using Amazon SageMaker

The final step in this data pipeline is to make the processed data available in a Jupyter notebook instance of the Amazon SageMaker. Jupyter notebooks are popularly used among data scientists to do exploratory data analysis, build and train machine learning models.

Create Notebook Instance in Amazon SageMaker

Step1: In the Amazon SageMaker console choose Create notebook instance.

Step2: In the Notebook Instance settings populate the Notebook instance name, choose an instance type depends on data size, and a role for the notebook instances in Amazon SageMaker to interact with Amazon S3. The SageMaker execution role needs to have the required permission to Athena, the S3 buckets where the data resides, and KMS if encrypted.

Step3: Wait for the Notebook instances to be created and the Status to change to InService.

Step4: Choose the Open Jupyter, which will open the notebook interface in a new browser tab.

Click new to create a new notebook in Jupyter. Amazon SageMaker provides several kernels for Jupyter including support for Python 2 and 3, MXNet, TensorFlow, and PySpark. Choose Python as the kernel for this exercise as it comes with the Pandas library built in.

Step5: Within the notebook, execute the following commands to install the Athena JDBC driver. PyAthena is a Python DB API 2.0 (PEP 249) compliant client for the Amazon Athena JDBC driver.

import sys
!{sys.executable} -m pip install PyAthena

Step6: After the Athena driver is installed, you can use the JDBC connection to connect to Athena and populate the Pandas data frames. For data scientists, working with data is typically divided into multiple stages: munging and cleaning data, analyzing/ modeling it, then organizing the results of the analysis into a form suitable for plotting or tabular display. Pandas is the ideal tool for all of these tasks.

from pyathena import connect
import pandas as pd
conn = connect(s3_staging_dir='<ATHENA QUERY RESULTS LOCATION>',
               region_name='REGION, for example, us-east-1')

df = pd.read_sql("SELECT * FROM <DATABASE>.<TABLENAME> limit 10;", conn)
df

As shown above, the dataframe always stays consistent with the latest incoming data because of the data engineering pipeline setup earlier in the ML workflow. This dataframe can be used for downstream ad-hoc model building purposes or for exploratory data analysis.

That’s it folks. Thanks for the read.

This story is authored by PV Subbareddy. Subbareddy is a Big Data Engineer specializing on Cloud Big Data Services and Apache Spark Ecosystem.

Real Time Face Identification on Live Camera Feed using Amazon Rekognition Video and Kinesis Video Streams

Note: Some images in this blog post are not showing up, please read the same post on medium here.

In this post, we are going to learn how to perform facial analysis on a live feed by setting up a serverless video analytics architecture using Amazon Rekognition Video and Amazon Kinesis Video Streams.

Use cases:

  1. Intruder notification system
  2. Employee sign-in system

Architecture overview

Real time face recognition using AWS on a live video stream

We shall learn how to use the webcam of a laptop (we can, of course, use professional grade cameras and hook it up with Kinesis Video streams for a production ready system) to send a live video feed to the  Amazon Kinesis Video Stream. The stream processor in Amazon Rekognition Video picks up this webcam feed and analyses by comparing this feed with the faces in the face collection created beforehand. This analysis is written to the Amazon Kinesis Data Stream. For each record written to the Kinesis data stream, the lambda function is invoked. This lambda reads the record from kinesis stream data. If there are any facial matches or mismatches, depending upon how the lambda is configured an email notification is sent via Amazon SNS (Simple Notification Service) to the registered email addresses.

Note: Make sure all the above AWS resources are created in the same region.

Before going further, make sure you already have the AWS CLI configured on your machine. If not, follow this link –  https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html

Step-1: Create Kinesis Video Stream

We need to create the Kinesis Video Stream for our webcam to connect and send the feed to Kinesis Video Streams from AWS console. Create a new one by clicking the Create button. Give the stream a name in stream configuration.

We have successfully created a Kinesis video stream.

Step-2: Send Live Feed to the Kinesis Video Stream

We need a video producer, anything that sends media data to the Kinesis video stream is called a producer. AWS currently provides producer library primarily supported in these four languages(JAVA, Android, C++, C) only. We use the C++ Producer Library as a GStreamer plugin.

To easily send media from a variety of devices on a variety of operating systems, this tutorial uses GStreamer, an open-source media framework that standardizes access to cameras and other media sources.

Download the Amazon Kinesis Video Streams Producer SDK from Github using the following Git command:

git clone https://github.com/awslabs/amazon-kinesis-video-streams-producer-sdk-cpp 

After downloading successfully you can compile and install the GStreamer sample in the kinesis-video-native-build directory using the following commands:

For Ubuntu – run the following commands

sudo apt-get update

sudo apt-get install libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev gstreamer1.0-plugins-base-apps

sudo apt-get install gstreamer1.0-plugins-bad gstreamer1.0-plugins-good gstreamer1.0-plugins-ugly gstreamer1.0-tools

For Windows – run the following commands

Inside mingw32 or mingw64 shell, go to kinesis-video-native-build directory and run ./min-install-script 

For macOS – run the following commands

Install homebrew

Run brew install pkg-config openssl cmake gstreamer gst-plugins-base gst-plugins-good gst-plugins-bad gst-plugins-ugly log4cplus

Go to kinesis-video-native-build directory and run ./min-install-script

Running the GStreamer webcam sample application:
The sample application kinesis_video_gstreamer_sample_app in the kinesis-video-native-build directory uses GStreamer pipeline to get video data from the camera. Launch it with the kinesis video stream name created in Step-1 and it will start streaming from the camera.

AWS_ACCESS_KEY_ID=<YourAccessKeyId> AWS_SECRET_ACCESS_KEY=<YourSecretAccessKey> ./kinesis_video_gstreamer_sample_app <stream_name>

After running the above command, the streaming would start and you can see the live feed in Media preview of Kinesis Video Stream created in the Step-1.

Step-3: Creating Resources using AWS CloudFormation Stack

The CloudFormation stack will create the resources that are highlighted in the following image.

Highlighted resources to be created using CloudFormation
  1. The following link will automatically open a new CloudFormation Stack in us-west-2: CloudFormation Stack.
  2. Go to View in Designer and edit the template code and click on create stack.
    1. Change the name of the application from Default.
    2. Change the Nodejs version of the RekognitionVideoLambda to 10.x (Version 6.10 isn’t supported anymore) and then click on create stack.
editing CloudFormation stack template

Enter your Email Address to receive notifications and then choose Next as shown below:

Skip the Configure stack options step by choosing Next. In the final step you must check the checkbox and then choose Create stack as in the below image:

Once the stack created successfully you can see it in your stacks as below with status CREATE_COMPLETE. This creates the required resources.

Once completed, you will receive a confirmation email to subscribe/receive notifications. Make sure you subscribed to that.

snapshot from subscribe confirmation email

Step-4: Add face to a Collection

As learned earlier the Stream Processor in Amazon Rekognition Video picks up and analyzes the feed coming from Kinesis Video Stream by comparing it with the faces in the face collection. Let’s create this collection.

For more information follow the link – https://docs.aws.amazon.com/rekognition/latest/dg/create-collection-procedure.html

Create Collection: Create a face collection using the AWS CLI on the command line with below command:

aws rekognition create-collection --collection-id <Collection_Name> --region us-west-2

Add face(s) to the collection: You can use your picture for testing.
First we need to upload the image(s) to an Amazon S3 bucket in the us-west-2 region. Run the below command by replacing BUCKET_NAME, FILE_NAME with your details and you can give any name for FACE-TAG ( can be name of the person).

aws rekognition index-faces --image '{"S3Object":{"Bucket":"<BUCKET_NAME>","Name":"<FILE_NAME>.jpeg"}}' --collection-id "rekVideoBlog" --detection-attributes "ALL" --external-image-id "<FACE-TAG>" --region us-west-2

After that you will receive a response as output as shown below

Now we all ready to go further. If you have multiple photos run the above command again with the new file name.

Step-5: Creating the Stream Processor

We need to create a stream processor which does all the work of reading video stream, facial analysis against face collection and finally writing the analysis data to Kinesis data stream. It contains information about the Kinesis data stream, Kinesis video stream, Face collection ID and the role that is used by Rekognition to access Kinesis Video Stream.

Copy meta info required:
Go to your Kinesis video stream created in step-1, note down the Stream ARN (KINESIS_VIDEO_STREAM_ARN) from the stream info as shown below:

Next from the CloudFormation stack output note down the values of KinesisDataStreamArn (KINESIS_DATA_STREAM_ARN) and RecognitionVideoIAM (IAM_ROLE_ARN) as shown below:

Now create a JSON file in your system that contains the following information:

{
       "Name": "streamProcessorForRekognitionVideoBlog",
       "Input": {
              "KinesisVideoStream": {
                     "Arn": "<KINESIS_VIDEO_STREAM_ARN>"
              }
       },
       "Output": {
              "KinesisDataStream": {
                     "Arn": "<KINESIS_DATA_STREAM_ARN>"
              }
       },
       "RoleArn": "<IAM_ROLE_ARN>",
       "Settings": {
              "FaceSearch": {
                     "CollectionId": "COLLECTION_NAME",
                     "FaceMatchThreshold": 85.5
              }
       }
}

Create the stream processor with AWS CLI from command line with following command:

aws rekognition create-stream-processor --region us-west-2 --cli-input-json file://<PATH_TO_JSON_FILE_ABOVE>

Now start the stream processor with following command:

aws rekognition start-stream-processor --name streamProcessorForRekognitionVideoBlog --region us-west-2

You can see if the stream processor is in a running state with following command:

aws rekognition list-stream-processors --region us-west-2

If it’s running, you will see below response:

And also make sure your camera is streaming in command line and feed is being received by the Kinesis video stream.

Now you would get notified whenever known or an unknown person shows up in your webcam.

Example notification:

Note: 

  1. Lambda can be configured to send email notification only if unknown faces are detected or vice versa.
  2. For cost optimization, one could use a python script to call Amazon Rekognition service with a snapshot only when a person is detected instead of wasting resources in uninhabited/unpeopled area.

Step-6: Cleaning Up Once Done

Stop the stream processor.

aws rekognition stop-stream-processor --name streamProcessorForRekognitionVideoBlog --region us-west-2

Delete the stream processor.

aws rekognition delete-stream-processor --name streamProcessorForRekognitionVideoBlog --region us-west-2

Delete the Kinesis Video Stream. Go to the Kinesis video streams from AWS console and select your stream and Delete.

Delete the CloudFormation stack. Go to the CloudFormation, then select stacks from AWS console and select your stack and Delete.

Thanks for the read! I hope it was both fun and useful.

This story is co-authored by Venu and Koushik. Venu is a software engineer and machine learning enthusiast. Koushik is a software engineer and a keen data science and machine learning enthusiast.

Understanding React Native Navigation

This post focuses on building a react-native app which supports navigation using react-native-navigation.

Why React Native Navigation?

There are many ways in which we can implement navigation functionality in mobile apps developed using React Native. The two most popular among them are
‘React Navigation’ and ‘React Native Navigation’.

As React Native Navigation uses the native modules with a JS bridge, the performance will be better when compared to React Navigation.

In the newer versions of react-native it is not necessary to link the libraries manually. The auto linking feature was introduced in the react-native version 0.6. The newer versions of react-native ( above 0.60 ) use CocoaPods by default.

Hence it is not necessary to link the libraries in Xcode. The libraries will be linked automatically when we install the libraries using the ‘pod install’ command. Over the course of this post, we will be building a simple app for iOS in react-native (version above 0.60) which supports navigation using ‘react-native-navigation’.

Create a react-native app:

To set up react-native environment in your machine refer to the link below.

https://facebook.github.io/react-native/docs/getting-started.html

Create a react-native app using the following command.

react-native init sample-app

It will create a project folder with the name sample-app. Then run the following commands to install the dependencies.

cd sample-app
npm install

After that run the following command inside the project directory to open the app in an iPhone Simulator.

react-native run-ios

You should see the app running in an iOS simulator in a while. And it will display a welcome screen if you haven’t made any code changes in App.js file.

Steps to install React-Native-Navigation:

To install react-native-navigation, run the following command inside the project directory.

npm install react-native-navigation --save

It will install the react-native-navigation package into the project. Add the following line to the Podfile.

pod 'ReactNativeNavigation', :podspec => '../node_modules/react-native-navigation/ReactNativeNavigation.podspec'

Then move to the ios folder in the project and install the dependencies using the following commands.

cd ios
pod install

Pod install command will install the SDKs specified in the Podspec, along with any dependencies they may have.

After that we need to edit AppDelegate.m file in Xcode.
To do so, open the navigation.xcworkspace file located in sample-app/ios folder. Remove the content inside the AppDelegate.m file and add the following code into the file.

File : AppDelegate.m

#import "AppDelegate.h"

#import <React/RCTBundleURLProvider.h>
#import <React/RCTRootView.h>
#import <ReactNativeNavigation/ReactNativeNavigation.h>

@implementation AppDelegate

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
  NSURL *jsCodeLocation = [[RCTBundleURLProvider sharedSettings] jsBundleURLForBundleRoot:@"index" fallbackResource:nil];
  [ReactNativeNavigation bootstrap:jsCodeLocation launchOptions:launchOptions];
  
  return YES;
}

@end

Navigation Workflow:

First we will create a landing/home screen for the app. Create a Landing Screen inside the project folder by following the below steps.

  1. Create a folder named src inside the project.
  2. Add another folder inside src and name it screens.
  3. Create Welcome.js file in screens folder and add the following code into it.

File : Welcome.js

import React, {Component} from 'react';
import {View, Text, StyleSheet, Button} from 'react-native';

class WelcomeScreen extends Component {
  render() {
    return (
      <View style={styles.container}>
        <Text style={styles.textStyles}>React Native Navigation</Text>
        <View style={styles.loginButtonContainer}>
          <Button
            title="Login"
            color="#FFFFFF"
          />
        </View>
        <View style={styles.signUpButtonContainer}>
          <Button
            title="Signup"
            color="#FFFFFF"
          />
        </View>
      </View>
    );
  }
}

export default WelcomeScreen;

const styles = StyleSheet.create({
  container: {
    flex: 1,
    alignItems: 'center',
    justifyContent: 'flex-start',
    backgroundColor: '#CCCCCC',
  },
  textStyles: {
    alignItems: 'flex-start',
    fontSize: 25,
    fontWeight: 'bold',
    marginTop: 200,
  },
  loginButtonContainer: {
    backgroundColor: '#2494fb',
    borderRadius: 10,
    marginTop: 40,
    width: '50%',
    shadowColor: '#000000',
  },
  signUpButtonContainer: {
    backgroundColor: '#2494fb',
    borderRadius: 10,
    marginTop: 10,
    width: '50%',
    shadowColor: '#000000',
  },
});

We need to register the screens for the navigator to work. For that we need to modify the contents of index.js file. Replace the contents of index.js file with the following code.

File : index.js

import { Navigation } from 'react-native-navigation';
import Welcome from './src/screens/Welcome'; 

Navigation.registerComponent('Welcome', ()=>Welcome)
Navigation.events().registerAppLaunchedListener(()=>{
    Navigation.setRoot({
        root:{
            component:{
                name:'Welcome'
            }
        }
    })
})

Explanation:

Navigation.registerComponent() is used to register the screen. As we have already created the Welcome screen component, we would use that to register.

Navigation.registerComponent('Welcome', () => Welcome).

The above step only registers our screen but not the application.So to register our application we would use Navigation.events().registerAppLaunchedListener().

Inside the callback of the registerAppLaunchedListener we will set the root of the application using Navigation.setRoot(). The code would be like the following.

Navigation.events().registerAppLaunchedListener(()=>{
    Navigation.setRoot({
        root:{
            component:{
                name:'Welcome'
            }
        }
    })
})

Here the component name is Welcome which is the screen we created and registered.

Note: The name of the component and the name of the registerComponent screen must be same.

Run the file from Xcode which you opened earlier using navigation.xcworkspace. And you will be seeing the landing screen which we created in an iPhone Simulator. It would look like below.

Now we have the landing page of the app, we shall now try to navigate to other screens when we click either on Login or SignUp button. To obtain that mechanism we will create two more screens inside the screens folder as Login.js and SignUp.js.

Place the below code snippet in Login.js file.

File : Login.js

import React, { Component } from "react";
import { View, Text, StyleSheet } from "react-native";

class Login extends Component {
    render() {
        return(
            <View style={styles.container}>
                <Text style={styles.textStyles}>
                    Welcome to Login Page.
                </Text>
            </View> 
        );
    }
}

export default Login;

const styles = StyleSheet.create({
    container: {
        flex:1,
        alignItems:'center',
        justifyContent: 'center',
        backgroundColor: '#CCCCCC',
    },
    textStyles: {
        alignItems:'center',
        fontSize: 25
    }
});

After that place the below code in SignUp.js

File : SignUp.js

import React, { Component } from "react";
import { View, Text, StyleSheet } from "react-native";

class SignUp extends Component {
    render() {
        return(
            <View style={styles.container}>
                <Text style={styles.textStyles}>
                    Welcome to SignUp Page.
                </Text>
            </View> 
        );
    }
}

export default SignUp;

const styles = StyleSheet.create({
    container: {
        flex:1,
        alignItems:'center',
        justifyContent: 'center',
        backgroundColor: '#CCCCCC',
    },
    textStyles: {
        alignItems:'center',
        fontSize: 25
    }
});

As we have to navigate between screens, we cannot set the root screen as only component. So to navigate we must use Stack (array of screens). The Stack consists of children array where we set the component name of the landing page of the app. Hence we also need to replace the contents of index.js file as below.

File : index.js

import { Navigation } from 'react-native-navigation';
import Welcome from './src/screens/Welcome';
import Login from './src/screens/Login';
import SignUp from './src/screens/SignUp';

Navigation.registerComponent('Welcome', ()=>Welcome)
Navigation.registerComponent('Login', () => Login)
Navigation.registerComponent('SignUp', () => SignUp)

Navigation.events().registerAppLaunchedListener(()=>{
    Navigation.setRoot({
        root:{
            stack:{
                id:'navigationStack',
                children: [
                        {
                            component :{
                                name: 'Welcome'
                            },
                        },
                ]
            }
        }
    })
})

To add the title to the landing page, replace the Navigator.setRoot({}) function in the index.js file as below.

Navigation.setRoot({
        root:{
            stack:{
                id:'navigationStack',
                children: [
                        {
                            component :{
                                name: 'Welcome',
                                options: {
                                    topBar: {
                                        title: {
                                            text: 'Home'
                                        }
                                    }
                                }
                            },
                        },
                ]
            }
        }
    })

The landing screen will now look like below which has HOME as its title.

Now we will write the event handler function in Welcome.js file so that we can navigate between the screens when we click on either the Login or SignUp button. For that replace the content of Welcome.js file as below.

File : Welcome.js

import React, {Component} from 'react';
import {View, Text, StyleSheet, Button} from 'react-native';
import {Navigation} from 'react-native-navigation';

class WelcomeScreen extends Component {
  goToScreen = screenName => {
    Navigation.push(this.props.componentId, {
      component: {
        name: screenName,
      },
    });
  };
  render() {
    return (
      <View style={styles.container}>
        <Text style={styles.textStyles}>React Native Navigation</Text>
        <View style={styles.loginButtonContainer}>
          <Button
            title="Login"
            color="#FFFFFF"
            onPress={() => this.goToScreen('Login')}
          />
        </View>
        <View style={styles.signUpButtonContainer}>
          <Button
            title="Signup"
            color="#FFFFFF"
            onPress={() => this.goToScreen('SignUp')}
          />
        </View>
      </View>
    );
  }
}

export default WelcomeScreen;

const styles = StyleSheet.create({
  container: {
    flex: 1,
    alignItems: 'center',
    justifyContent: 'flex-start',
    backgroundColor: '#CCCCCC',
  },
  textStyles: {
    alignItems: 'flex-start',
    fontSize: 25,
    fontWeight: 'bold',
    marginTop: 200,
  },
  loginButtonContainer: {
    backgroundColor: '#2494fb',
    borderRadius: 10,
    marginTop: 40,
    width: '50%',
    shadowColor: '#000000',
  },
  signUpButtonContainer: {
    backgroundColor: '#2494fb',
    borderRadius: 10,
    marginTop: 10,
    width: '50%',
    shadowColor: '#000000',
  },
});

To navigate to a particular screen we need to import Navigation component from react-native-navigation.When we click either on the Login button or SignUp button, the event is handled in goToScreen method.

We are passing the name of the screen as an argument to the method and it must be same as the name which is used to register the screen in Navigator.registerComponent() in index.js file.

In the goToScreen method we are using this.props.componentId, which is the Id of the current screen and is made automatically available by react-native-navigation.Inside Navigation.push function we assign the name of the screen to be navigated to ‘component’ object and it is the argument which we are passing to the method when we press either Login or SignUp button.

Then run the project using Xcode. The Welcome.js screen appears in a while. Then if we click on the Login button, the app would be navigated to the Login.js Page and it would be navigated to the SignUp.js page if we click on the SignUp button. React Native Navigation also adds the back button at the top using which we can go back to the screen we navigated from.

The Login and SignUp screens would look like below.

I hope, the article was helpful in understanding React Native Navigation. Thanks for the read.

This story is authored by Dheeraj Kumar. Dheeraj is a software engineer specializing in React Native and React based frontend development.

Setting Up a Data Lake on AWS Cloud Using LakeFormation

Setting up a Data Lake involves multiple steps such as collecting, cleansing, moving, and cataloging data, and then securely making that data available for downstream analytics and Machine Learning. AWS LakeFormation simplifies these processes and also automates certain processes like data ingestion. In this post, we shall be learning how to build a very simple data lake using LakeFormation with hypothetical retail sales data.

AWS Lake Formation provides its own permissions model that augments the AWS IAM permissions model. This centrally defined permissions model enables fine-grained access to data stored in data lake through a simple grant/revoke mechanism. These permissions are enforced at the table and column level on the data catalogue and are mapped to the underlying objects in S3. LakeFormation permissions are applicable across the full portfolio of AWS analytics and Machine Learning services, including Amazon Athena and Amazon Redshift.

So, let’s get on with the setup.

Adding an administrator

First and foremost step in using LakeFormation is to create an administrator. An administrator has full access to LakeFormation system and initial access to data configuration and access permissions. 

After adding an administrator, navigate to the Dashboard using the sidebar. This illustrates the typical process of Data lake setup.

Register location

From Register and Ingest sub menu in the sidebar, If you wish to setup data ingestion, that is, import unprocessed/landing data, AWS LakeFormation comes with in-house Blueprints that one could use to build Workflows. These workflows could be scheduled as per the needs of the end-user. Sources of data for these workflows can be a JDBC source, log files and many more. Learn more about importing data using workflows here.

If your ingestion process doesn’t involve any of the above mentioned ways and writes directly to S3, it’s alright. Either way we end up registering that S3 location as one of the Data Lake locations.

Once created you shall see its listing in the Data Lake locations.

You could not only access this location from here but also set permission to objects stored in that path. If preferred, one could register lake locations precisely for each processing zone and set permissions accordingly. I registered it to the whole bucket.

I created 2 retail datasets (.csv), one with 20 records and the other with 5 records. I have uploaded one of the datasets (20 records) to S3 with raw/retail_sales prefix.

Creating a Database

Lake Formation internally uses the Glue Data Catalog, so it shows all the databases available. From the Data Catalog sub menu in the sidebar, navigate to Databases to create and manage all the databases. I created a database called merchandise with default permissions.

Once created, you shall see its listing, and also manage, grant/revoke permissions and view tables in that DB.

Creating Crawlers and ETL jobs

From the Register and Ingest sub menu in the sidebar, navigate to Crawlers, Jobs to create and manage all Glue related services. Lake Formation redirects to AWS Glue and internally uses it. I created a crawler to get the metadata for objects residing in raw zone.

After running this crawler manually, now raw data can be queried from Athena.

I created an ETL job to run a transformation on this raw table data. 

All it does is change the class type of purchase date, which is from string class to date class. Creates partitions while writing to refined zone in parquet format. These partitions are created from the processing date but not the purchase date.

retail-raw-refined ETL job python script:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import datetime
from pyspark.sql.functions import *
from pyspark.sql.types import *
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql import *

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## @type: DataSource
## @args: [database = "merchandise", table_name = "raw_retail_sales", transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "merchandise", table_name = "raw_retail_sales", transformation_ctx = "datasource0")
## @type: ApplyMapping
## @args: [mapping = [("email_id", "string", "email_id", "string"), ("retailer_name", "string", "retailer_name", "string"), ("units_purchased", "long", "units_purchased", "long"), ("purchase_date", "string", "purchase_date", "date"), ("sale_id", "string", "sale_id", "string")], transformation_ctx = "applymapping1"]
## @return: applymapping1
## @inputs: [frame = datasource0]

#convert glue object to sparkDF
sparkDF = datasource0.toDF()
sparkDF = sparkDF.withColumn('purchase_date', unix_timestamp(sparkDF.purchase_date, 'dd/MM/yyyy').cast(TimestampType()))

applymapping1 = DynamicFrame.fromDF(sparkDF, glueContext,"datafields")
# applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("email_id", "string", "email_id", "string"), ("retailer_name", "string", "retailer_name", "string"), ("units_purchased", "long", "units_purchased", "long"), ("purchase_date", "string", "purchase_date", "date"), ("sale_id", "string", "sale_id", "string")], transformation_ctx = "applymapping1")
## @type: ResolveChoice
## @args: [choice = "make_struct", transformation_ctx = "resolvechoice2"]
## @return: resolvechoice2
## @inputs: [frame = applymapping1]
resolvechoice2 = ResolveChoice.apply(frame = applymapping1, choice = "make_struct", transformation_ctx = "resolvechoice2")
## @type: DropNullFields
## @args: [transformation_ctx = "dropnullfields3"]
## @return: dropnullfields3
## @inputs: [frame = resolvechoice2]
dropnullfields3 = DropNullFields.apply(frame = resolvechoice2, transformation_ctx = "dropnullfields3")
## @type: DataSink
## @args: [connection_type = "s3", connection_options = {"path": "s3://test-787/refined/retail_sales"}, format = "parquet", transformation_ctx = "datasink4"]
## @return: datasink4
## @inputs: [frame = dropnullfields3]
now = datetime.datetime.now()
path = "s3://test-787/refined/retail_sales/"+'year='+str(now.year)+'/month='+str(now.month)+'/day='+str(now.day)+'/'
datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": path}, format = "parquet", transformation_ctx = "datasink4")
job.commit()

The lakeformation:GetDataAccess permission is needed for this job to work. I created a new policy named LakeFormationGetDataAccess and attached it to AWSGlueServiceRoleDefault role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "lakeformation:GetDataAccess",
            "Resource": "*"
        }
    ]
}

After running the job manually, it will load new transformed data with partitions in the refined zone as specified in the job.

I created another crawler to get the metadata for these objects residing in refined zone.

After running this crawler manually, now refined data can be queried from Athena.

You could now see the newly added partition columns (year, month, day).

Let us add some new raw data and see how our ETL job process that delta difference.

We only want to process new data and old data is either moved to archive location or deleted from raw zone, whatever is preferred.

Run the ETL job again. See new files being added into refined zone.

Load new partitions using msck repair table query.

Note: Try creating another IAM user and as an administrator in the LakeFormation, give this user limited access to the tables, try querying using Athena. See if the permissions are working.

Pros and cons of LakeFormation

The UI is made simple, all under one roof. Most of the times, one needs to keep multiple tabs open and opening S3 locations is troublesome. This is made easy by register data lake locations feature, one not only can access these locations directly but also revoke/grant permissions of the objects residing there. 

Managing permissions on an Object level in S3 is a hectic process. But with LakeFormation permissions can be managed at the data catalog level. This enables one to grant/revoke permissions to users or roles on a table/column level. These permissions are internally mapped to underlying objects sitting in S3.

Though managing permissions, data ingestion workflow are made easy, but still most of the Glue processes like ETL, Crawler, ML specific transformations have to be setup manually.

This story is authored by Koushik. He is a software engineer and a keen data science and machine learning enthusiast.

Serverless Architecture for Lightening Fast Distributed File Transfer on AWS Data Lake

Today, we are very excited to share our insights on setting up a serverless architecture for setting up a lightening fast way* to copy large number of objects across multiple folders or partitions in an AWS data lake on S3. Typically in a data lake, data is kept across various zones depending on data lifecycle. For example, as the data arrives from source, it can be kept in the raw zone and then post processing moved to a processed zone, so that the lake is ready for the next influx of data. The rate of object transfer is a crucial factor, as it affects the overall efficiency of the data processing lifecycle in the data lake.

*In our tests, we copied more than 300K objects ranging from 1KB to 10GB in size from the raw zone into the processed zone. Compared to the best known tool for hyper fast file transfer on AWS called s3s3mirror, we were able to finish this transfer of about 24GB of data in about 50% less time. More details have been provided at the end of the post.

We created a lambda invoke architecture that copies files/objects concurrently. The below picture accurately depicts it.


OMS (Orchestrator-Master-Slave) Lambda Architecture

For example, If we have an S3 bucket with the following folder structure with the actual objects further contained within this hierarchy of folders, sub-folders and partitions.

S3 file structure

Let us look at how we can use OMS Architecture (Orchestrator-Master-Slave) to achieve hyper-fast distributed/concurrent file transfer. The above architecture can be divided into two halves, Orchestrator-Master, Master-Slave.

Orchestrator-Master

The Orchestrator simply invokes a Master Lambda for each folder. Each Master then iterates the objects in that folder (including all sub-folders and partitions) and invokes a Slave Lambda for each object to copy it to the destination.

Orchestrator-Master Lambda invoke

Let us look at the Orchestrator Lambda code.
Source-to-Destination-File-Transfer-Orchestrator:

import os
import boto3
import json
from datetime import datetime

client_lambda = boto3.client('lambda')
master_lambda = "Source-to-Destination-File-Transfer-Master"

folder_names = ["folder1", "folder2", "folder3", "folder4", "folder5", "folder6", "folder7", "folder8", "folder9"]

def lambda_handler(event, context):
    
    t = datetime.now()
    print("start-time",t)
    
    try:            
        for folder_name in folder_names:
            
            payload_data = {
              'folder_name': folder_name
            }                
        
            payload = json.dumps(payload_data)
            client_lambda.invoke(
                FunctionName = master_lambda,
                InvocationType = 'Event',
                LogType = 'None',
                Payload = payload
            )
            print(payload)
            
    except Exception as e:
        print(e)
        raise e

Master-Slave

Master-Slave Lambda invoke

Let us look at the Master Lambda code.
Source-to-Destination-File-Transfer-Master:

import os
import boto3
import json
from botocore.exceptions import ClientError

s3 = boto3.resource('s3')
client_lambda = boto3.client('lambda')

source_bucket_name = 'source bucket name'
source_bucket = s3.Bucket(source_bucket_name)

slave_lambda = "Source-to-Destination-File-Transfer-Slave"

def lambda_handler(event, context):

    try:
        source_prefix = "" #add if any
        source_prefix = source_prefix + "/" + event['table_name'] + "/"

        for obj in source_bucket.objects.filter(Prefix = source_prefix):
            path = obj.key
            payload_data = {
               'file_path': path
            }
            payload = json.dumps(payload_data)
            client_lambda.invoke(
                FunctionName = slave_lambda,
                InvocationType = 'Event',
                LogType = 'None',
                Payload = payload
            )

    except Exception as e:
        print(e)
        raise e

Slave

Let us look at the Slave Lambda code.
Source-to-Destination-File-Transfer-Slave:

import os
import boto3
import json
import re
from botocore.exceptions import ClientError

s3 = boto3.resource('s3')

source_prefix = "" #add if any
source_bucket_name = "source bucket name"
source_bucket = s3.Bucket(source_bucket_name )

destination_bucket_name = "destination bucket name"
destination_bucket = s3.Bucket(destination_bucket_name )

def lambda_handler(event, context):
    try:
        destination_prefix = "" #add if any
        
        source_obj = { 'Bucket': source_bucket_name, 'Key': event['file_path']}
        file_path = event['file_path']
        
        #copying file
        new_key = file_path.replace(source_prefix, destination_prefix)
        new_obj = source_bucket.Object(new_key)
        new_obj.copy(source_obj)
        
    except Exception as e:
        raise e

You must ensure that these Lambda functions have been configured to meet the maximum execution time and memory limit constraints as per your case. We tested by setting the upper limit of execution time as 5 minutes and 1GB of available memory.

Calculating the Rate of File Transfer

To calculate the rate of file transfer we are printing start time at the beginning of Orchestrator Lambda execution. Once the file transfer is complete, we use another lambda to extract the last modified date attribute of the last copied object.

Extract-Last-Modified:

import json
import boto3
from datetime import datetime
from dateutil import tz

s3 = boto3.resource('s3')

destination_bucket_name = "destination bucket name"
destination_bucket = s3.Bucket(destination_bucket_name)
destination_prefix = "" #add if any

def lambda_handler(event, context):
    
    #initializing with some old date
    last_modified_date = datetime(1940, 7, 4).replace(tzinfo = tz.tzlocal()) 

    for obj in my_bucket.objects.filter(Prefix = destination_prefix):
        
        obj_date = obj.last_modified.replace(tzinfo = tz.tzlocal())
        
        if last_modified_date < obj_date:
            last_modified_date = obj_date
    
    print("end-time: ", last_modified_date)

Now we have both start-time from Orchestrator Lambda and end-time from Extract-last-modified Lambda, their difference is the time taken for file transfer.

Before writing this post, we copied 24.1GB of objects using the above architecture, results are shown in the following screenshots:

duration	=	end-time - start-time
		=	10:04:49 - 10:03:28
		=	00:01:21 (hh-mm-ss)

To check the efficiency of our OMS Architecture, we compared the results of OMS with s3s3mirror, a utility for mirroring content from one S3 bucket to another or to/from the local filesystem. Below screenshot has the file transfer stats of s3s3 for the same set of files:

As we see the difference was 1 minutes and 8 seconds for total data transfer of about 24GB, it can be much higher for large data sets if we add more optimizations. I have only shared a generalized view of the OMS Architecture, it can be further fine-tuned to specific needs and get a highly optimized performance. For instance, if you have partitions in each folder and the OMS Architecture could yield much better results if you invoke Master Lambda for each partition inside the folder instead of invoking the master just at the folder level.

Thanks for the read. Looking forward to your thoughts.

This story is co-authored by Koushik and Subbareddy. Koushik is a software engineer and a keen data science and machine learning enthusiast. Subbareddy is a Big Data Engineer specializing on Cloud Big Data Services and Apache Spark Ecosystem.

Serverless Web Application Architecture using React with Amplify: Part2

In Part1, we have learned and built the AWS services (API Gateway, Lambda, DynamoDB) specific components of the serverless architecture as shown below. We shall build the rest of the serverless architecture in this article. To sum it up, we will build Jotter (a note taking app) in React and use Amplify to talk to these AWS services. Finally we shall deploy the React app on S3 cloud storage.

Serverless Web Application Architecture using React with Amplify

After deployment:

Serverless React Web Application Hosted on S3

Building the rest of the architecture

  1. Setting up a Cognito User Pool and Identity pool
  2. Building React application and configure Amplify.
  3. Create Sign-in, Sign-up using Cognito with Amplify.
  4. API Gateway integration.
  5. Secure the API Gateway with Cognito and test it.
  6. Deployment on S3 cloud storage.

Setting up a Cognito User Pool and Identity pool

Go to AWS console > Cognito > Create user pool.
Give the pool a name, and click on Review defaults.

This will show all the default settings for a cognito user pool, let’s go to each section on the left nav bar and modify accordingly such that it looks like below and create the cognito user pool.

Creating an App client. While creating an app client do not enable generate client secret.

After all modifications go to review and create pool. Now note down Pool Id, Pool ARN and App client Id

More details on setting up an authentication service using Cognito can be found here.

Let us build an Identity pool, which we could use to access AWS services (API Gateway in our case). But before that let us understand the difference between user pool and identity pool. Below image accurately depicts how both the services are different and how they complement each other.

Now we need to specify what AWS resources are accessible for users with temporary credentials obtained from the Cognito Identity Pool. In our case we want to invoke API gateway. Below policy exactly does that.

{
 "Version": "2012-10-17",
 "Statement": [
    {
     "Effect": "Allow",
     "Action": [
       "mobileanalytics:PutEvents",
       "cognito-sync:*",
       "cognito-identity:*"
      ],
     "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "execute-api:Invoke"
      ],
      "Resource": [
        "arn:aws:executeapi:YOUR_API_GATEWAY_REGION:*:YOUR_API_GATEWAY_ID/*/*/*"
      ]
    }
  ]
}

Add your API Gateway region and gateway id in the resource ARN.

Copy paste the above policy, and click Allow. Click on edit identity pool and note down the Identity pool ID.

Building React application and configuring Amplify

Create a new react project: 

create-react-app jotter

Now go to project directory and run the app.

cd jotter
npm start

You should see your app running on local host port 3000. To start editing, open project directory in your favourite code editor.

Open up public/index.html and edit the title tag to:

<title>Jotter - A note taking app</title>

Installing react-bootstrap:
We shall be using bootstrap for styling

npm install react-bootstrap --save

We also need bootstrap CDN for the above library to work. Open up public/index.html file and add the below CDN link above title tag.

<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T"
crossorigin="anonymous" />

Handle Routes with React Router:

We are building a single page app, we are going to use React Router to handle routes/navigation.

Go to src/index.js replace 

ReactDOM.render(<App />, document.getElementById('root'));

with

ReactDOM.render(<Router><App /></Router>, document.getElementById('root'));

Let us create a routes component.
routes.js:

import React from "react";
import CreateJot from './CreateJot';
import AllJotsList from './AllJotsList';
import Login from './Login';
import Signup from './Signup';
import NotFound from './NotFound';
import Viewjot from './Viewjot';
import { Route, Switch } from "react-router-dom";

export default () => 
<Switch>
   <Route exact path="/" component={Login}/>
   <Route exact path="/newjot" component={CreateJot} />
   <Route exact path="/alljots" component={AllJotsList} />
   <Route exact path="/login" component={Login} />
   <Route exact path="/signup" component={Signup} />
   <Route exact path="/view" component={Viewjot}/>
   <Route component={NotFound} />
</Switch>;

If you observed we used a switch component here, which is more like an if else ladder statement, first match is rendered. If no path matches, then the last NotFound component is rendered. For now all the above components simply return <h1> tag with the name of the component. Create all the components like the following.

Signup.js

import React from 'react';

export default class Signup extends React.Component {
  constructor(props) {
    super(props);
    this.state = {}
  }
  render(){
    return (
       <h1>Signup</h1>
    );
  }
}

Lets us setup navigation and test our routes.
App.js

import React, {Fragment} from 'react';
import { Link, NavLink } from "react-router-dom";
import Routes from './components/Routes';
import { withRouter } from 'react-router';

class App extends React.Component {
    constructor(props) {
        super(props);
        this.state = {
            isAuthenticated: false
        }
    }
    render(){
        return (
            <Fragment>
                <div className="navbar navbar-expand-lg navbar-light bg-light">
                    <Link to="/" className="navbar-brand" href="#"><h1>Jotter.io</h1></Link>
                    <div className="collapse navbar-collapse" id="navbarNav">
                        <ul className="navbar-nav">
                            {this.state.isAuthenticated ? 
                                <Fragment>
                                    <li className="nav-item">
                                        <NavLink onClick={this.handleLogout}>Logout</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/newjot" className="nav-link">New Jot</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/alljots" className="nav-link">All Jots</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/viewjot" className="nav-link">View Jot</NavLink>
                                    </li>
                                </Fragment> : 
                                <Fragment>
                                    <li className="nav-item">
                                        <NavLink to="/login" className="nav-link">Login</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/signup" className="nav-link">Signup</NavLink>
                                    </li>
                                </Fragment>
                            } 
                        </ul>
                    </div>                   
                </div>
                <Routes/>
            </Fragment>
        );
    }
}

export default withRouter(App);

What we have done is initially set the isAuthenticated state to false. And conditionally render the navigation bar to show only those routes which are accessible when the user isn’t logged in and vice versa.

Let us test our routes now.

Now our routes are working fine. Though we conditionally rendered the navigation bar, all routes are still accessible via URL.

We shall work on securing our routes, once we configure Amplify. Let us build the Login and Signup components and then configure Amplify to secure our routes.
Login.js

import React, { Component } from "react";
import { FormGroup, FormControl, FormLabel, Button } from "react-bootstrap";

export default class Login extends Component {
    constructor(props) {
        super(props);
        this.state = {
            email: "",
            password: ""
        };
    }

    validateForm() {
        return this.state.email.length > 0 && this.state.password.length>0;
    }

    handleChange = event => {
        this.setState({
                [event.target.id]: event.target.value
            });
    }

    handleSubmit = event => {
        event.preventDefault();
        console.log("Submitted",this.state.email,this.state.password);
    }

    render() {
        return (
            <div className="Home">
                <div className="col-md-4"> 
                    <form >
                        <FormGroup controlId="email">
                            <FormLabel>Email</FormLabel>
                            <FormControl
                                autoFocus
                                type="email"
                                value={this.state.email}
                                onChange={this.handleChange}
                            />
                        </FormGroup>
                        <FormGroup controlId="password" >
                            <FormLabel>Password</FormLabel>
                            <FormControl
                                value={this.state.password}
                                onChange={this.handleChange}
                                type="password"
                            />
                        </FormGroup>
                        <Button onClick={this.handleSubmit}>
                        Login
                        </Button>
                            
                    </form>
                </div>
            </div>
        );
    }
}

Simple login page that logs user credentials on submit.

Signup.js

import React, { Component } from "react";
import { FormText, FormGroup, FormControl, FormLabel } from "react-bootstrap";
import React, { Component } from "react";
import { FormText, FormGroup, FormControl, FormLabel, Button } from "react-bootstrap";

export default class sample extends Component {
   constructor(props) {
   super(props);
   this.state = {
       email: "",
       password: "",
       confirmPassword: "",
       confirmationCode: "",
       newUser: null
       };
   }
   validateForm() {
       return (
           this.state.email.length > 0 && this.state.password.length > 0 && this.state.password === this.state.confirmPassword);
   }
      
   validateConfirmationForm() {
       return this.state.confirmationCode.length > 0;
   }
      
   handleChange = event => {
       this.setState({
           [event.target.id]: event.target.value
       });
   }

   handleSubmit = async event => {
       event.preventDefault();
   }
  
   handleConfirmationSubmit = async event => {
       event.preventDefault();
   }

   render() {
       return (
           <div className="Signup">
               {this.state.newUser === null ? this.renderForm() : this.renderConfirmationForm()}
           </div>
       );
   }

   renderConfirmationForm() {
       return (
           <div className="Home">
               <div className="col-md-4">
                   <form onSubmit={this.handleConfirmationSubmit}>
                       <FormGroup controlId="confirmationCode" >
                           <FormLabel>Confirmation Code</FormLabel>
                           <FormControl
                               autoFocus
                               type="tel"
                               value={this.state.confirmationCode}
                               onChange={this.handleChange}
                           />
                           <FormText>Please check your email for the code.</FormText>
                       </FormGroup>
                       <Button type="submit">
                           Verify
                       </Button>   
                   </form>
               </div>
           </div>
       );
   }

   renderForm() {
       return (
           <div className="Home">
               <div className="col-md-4">
                   <form onSubmit={this.handleSubmit}>
                       <FormGroup controlId="email" >
                           <FormLabel>Email</FormLabel>
                           <FormControl   
                               autoFocus
                               type="email"
                               value={this.state.email}
                               onChange={this.handleChange}
                           />
                       </FormGroup>
                       <FormGroup controlId="password" >
                           <FormLabel>Password</FormLabel>
                           <FormControl
                               value={this.state.password}
                               onChange={this.handleChange}
                               type="password"
                           />
                       </FormGroup>
                       <FormGroup controlId="confirmPassword" >
                           <FormLabel>Confirm Password</FormLabel>
                           <FormControl
                               value={this.state.confirmPassword}
                               onChange={this.handleChange}
                               type="password"
                           />
                       </FormGroup>
                       <Button type="submit">
                           Signup
                       </Button>   
                   </form>
               </div>
           </div>
       );
   }
}

Configuring Amplify:

Install aws-amplify.

npm install --save aws-amplify

Create a config file with all the required details to connect to aws services we created earlier in part1.

config.js

export default {
   apiGateway: {
       REGION: "xx-xxxx-x",
       URL: " https://xxxxxxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/"
   },
   cognito: {
       REGION: "xx-xxxx-x",
       USER_POOL_ID: "xx-xxxx-x_yyyyyxxxx",
       APP_CLIENT_ID: "xxxyyyxxyxyxyxxyxyxxxyxxyx",
       IDENTITY_POOL_ID: "xx-xxxx-x:aaaabbbb-aaaa-cccc-dddd-aaaabbbbcccc"
   }
  };

Import the config.js file and aws-amplify.
Open up index.js and add the configuration like this.

import React from 'react';
import ReactDOM from 'react-dom';
import './index.css';
import App from './App';
import * as serviceWorker from './serviceWorker';
import { BrowserRouter as Router} from "react-router-dom";
import config from "./config";
import Amplify from "aws-amplify";

Amplify.configure({
   Auth: {
       mandatorySignIn: true,
       region: config.cognito.REGION,
       userPoolId: config.cognito.USER_POOL_ID,
       identityPoolId: config.cognito.IDENTITY_POOL_ID,
       userPoolWebClientId: config.cognito.APP_CLIENT_ID
   },
   API: {
       endpoints: [
           {
               name: "jotter",
               endpoint: config.apiGateway.URL,
               region: config.apiGateway.REGION
           },
       ]
   }
});

ReactDOM.render(<Router><App /></Router>, document.getElementById('root'));

serviceWorker.unregister();

Now we have successfully configured amplify, we could connect to and use aws services with ease. If you would like to use any other aws services, you know where to configure them.

App.js

import React, {Fragment} from 'react';
import { Link, NavLink } from "react-router-dom";
import Routes from './components/Routes';
import { withRouter } from 'react-router';
import { Auth } from "aws-amplify";

class App extends React.Component {
    constructor(props) {
        super(props);
        this.state = {
            isAuthenticated: false
        }
    }

    userHasAuthenticated = (value) => {
        this.setState({ isAuthenticated: value });
    }

    handleLogout = async event => {
        await Auth.signOut();
        this.userHasAuthenticated(false);
        this.props.history.push("/login");
    }

    async componentDidMount() {
        try {
          await Auth.currentSession();
          this.userHasAuthenticated(true);
          this.props.history.push("/alljots");
        }
        catch(e) {
          if (e !== 'No current user') {
            alert(e);
          }
        }
    }

    render(){
        return (
            <Fragment>
                <div className="navbar navbar-expand-lg navbar-light bg-light">
                    <Link to="/" className="navbar-brand" href="#"><h1>Jotter.io</h1></Link>
                    <div className="collapse navbar-collapse" id="navbarNav">
                        <ul className="navbar-nav">
                            {this.state.isAuthenticated ? 
                                <Fragment>
                                    <li className="nav-item">
                                        <NavLink onClick={this.handleLogout}>Logout</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/newjot" className="nav-link">New Jot</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/alljots" className="nav-link">All Jots</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/viewjot" className="nav-link">View Jot</NavLink>
                                    </li>
                                </Fragment> : 
                                <Fragment>
                                    <li className="nav-item">
                                        <NavLink to="/login" className="nav-link">Login</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/signup" className="nav-link">Signup</NavLink>
                                    </li>
                                </Fragment>
                            } 
                        </ul>
                    </div>                   
                </div>
                <Routes userHasAuthenticated={this.userHasAuthenticated} isAuthenticated = {this.state.isAuthenticated}/>
            </Fragment>
        );
    }
}

export default withRouter(App);

We did 3 things here:

  1. Checking if there is any session after the component is mounted, so the isAuthenticated state is set to true. 
  2. Handling the logout click by clearing the session using Auth.signOut method. 
  3. Passing the isAuthenticated state and userHasAuthenticated function as props to all the routes. By doing so Login and Signup component can update the isAuthenticated state and all the other components receive this state as props. 

Routes.js

import React from "react";
import CreateJot from './CreateJot';
import AllJotsList from './AllJotsList';
import Login from './Login';
import Signup from './Signup';
import NotFound from './NotFound';
import Viewjot from './Viewjot';
import { Route, Switch } from "react-router-dom";

export default ( {childProps }) =>
<Switch>
   <Route path="/" exact component={Login} props={childProps}/>
   <Route exact path="/newjot" component={CreateJot} props={childProps}/>
   <Route exact path="/alljots" component={AllJotsList} props={childProps}/>
   <Route path="/login" exact component={Login} props={childProps}/>
   <Route path="/signup" exact component={Signup} props={childProps} />
   <Route path="/view" exact component={Viewjot}></Route>
   <Route component={NotFound} />
</Switch>;

Signup.js

import React, { Component } from "react";
import { FormText, FormGroup, FormControl, FormLabel, Button} from "react-bootstrap";
import { Auth } from "aws-amplify";

export default class Signup extends Component {
    constructor(props) {
    super(props);
    this.state = {
        email: "",
        password: "",
        confirmPassword: "",
        confirmationCode: "",
        newUser: null
        };
    }
    validateForm() {
        return (
            this.state.email.length > 0 && this.state.password.length > 0 && this.state.password === this.state.confirmPassword);
    }
        
    validateConfirmationForm() {
        return this.state.confirmationCode.length > 0;
    }
        
    handleChange = event => {
        this.setState({
            [event.target.id]: event.target.value
        });
    }

    handleSubmit = async event => {
        event.preventDefault();
        try {
            const newUser = await Auth.signUp({
                username: this.state.email,
                password: this.state.password
            });
            this.setState({newUser});
        } catch (e) {
            alert(e.message);
        }
    }
    
    handleConfirmationSubmit = async event => {
        event.preventDefault();
        try {
            await Auth.confirmSignUp(this.state.email, this.state.confirmationCode);
                await Auth.signIn(this.state.email, this.state.password);
                this.props.userHasAuthenticated(true);
                this.props.history.push("/");
        } catch (e) {
            alert(e.message);
        }
    }

    render() {
        return (
            <div className="Signup">
                {this.state.newUser === null ? this.renderForm() : this.renderConfirmationForm()}
            </div>
        );
    }

    renderConfirmationForm() {
        return (
            <div className="Home">
                <div className="col-md-4">
                    <form onSubmit={this.handleConfirmationSubmit}>
                        <FormGroup controlId="confirmationCode" >
                            <FormLabel>Confirmation Code</FormLabel>
                            <FormControl 
                                autoFocus
                                type="tel"
                                value={this.state.confirmationCode}
                                onChange={this.handleChange}
                            />
                            <FormText>Please check your email for the code.</FormText>
                        </FormGroup>
                        <Button type="submit">
                            Verify
                        </Button>
                    </form>
                </div>
            </div>
        );
    }

    renderForm() {
        return (
            <div className="Home">
                <div className="col-md-4">
                    <form onSubmit={this.handleSubmit}>
                        <FormGroup controlId="email" >
                            <FormLabel>Email</FormLabel>
                            <FormControl    
                                autoFocus
                                type="email"
                                value={this.state.email}
                                onChange={this.handleChange}
                            />
                        </FormGroup>
                        <FormGroup controlId="password" >
                            <FormLabel>Password</FormLabel>
                            <FormControl
                                value={this.state.password}
                                onChange={this.handleChange}
                                type="password"
                            />
                        </FormGroup>
                        <FormGroup controlId="confirmPassword" >
                            <FormLabel>Confirm Password</FormLabel>
                            <FormControl
                                value={this.state.confirmPassword}
                                onChange={this.handleChange}
                                type="password"
                            />
                        </FormGroup>
                        <Button type="submit">
                            Signup
                        </Button>
                    </form>
                </div>
            </div>
        );
    }
}

Let us test it.

Once user is successfully created, you shall see user appear in the User Pool. 

You will be redirected to Login page. As we haven’t secured our routes and you should be seeing this.

Let’s create those routes now.
Routes.js

import React from "react";
import CreateJot from './CreateJot';
import AllJotsList from './AllJotsList';
import Login from './Login';
import Signup from './Signup';
import NotFound from './NotFound';
import Viewjot from './Viewjot';
import { Route, Switch } from "react-router-dom";
import AuthorizedRoute from "./AuthorizedRoute";
import UnAuthorizedRoute from "./UnAuthorizedRoute";

export default (cprops) =>
<Switch>

  <UnAuthorizedRoute path="/login" exact component={Login} cprops={cprops}/>
  <UnAuthorizedRoute path="/signup" exact component={Signup} cprops={cprops}/>
 
  <AuthorizedRoute exact path="/newjot" component={CreateJot} cprops={cprops}/>
  <AuthorizedRoute exact path="/alljots" component={AllJotsList} cprops={cprops}/>
  <AuthorizedRoute exact path="/viewjot" component={Viewjot} cprops={cprops}/>
  <AuthorizedRoute exact path="/" component={AllJotsList} cprops={cprops}/>
 
  <Route component={NotFound} />
</Switch>;

AuthorizedRoute.js

import React from 'react'
import { Redirect, Route } from 'react-router-dom'

const AuthorizedRoute = ({ component: Component, cprops, ...rest }) => {
   // Add your own authentication on the below line.
 return (
   <Route {...rest} render={props =>
       cprops.isAuthenticated ? <Component {...props} {...cprops} /> : <Redirect to="/login" />}
   />
 )
}

export default AuthorizedRoute

What we did here is if logged in then we simply redirect to the component or else to login.

UnAuthorizedRoute.js

import React from 'react'
import { Redirect, Route } from 'react-router-dom'

const UnAuthorizedRoute = ({ component: Component, cprops, ...rest }) => {
   // Add your own authentication on the below line.
 return (
   <Route {...rest} render={props =>
       !cprops.isAuthenticated ? <Component {...props} {...cprops} /> : <Redirect to="/" />}
   />
 )
}

export default UnAuthorizedRoute

What we did here is if not logged in then we simply redirect to login or else to the component. Let us add the login functionality to our login page using Auth.signIn method.
Login.js

import React, { Component } from "react";
import { FormGroup, FormControl, FormLabel, Button } from "react-bootstrap";
import { Auth } from "aws-amplify";

export default class Login extends Component {
   constructor(props) {
       super(props);
       this.state = {
           email: "",
           password: ""
       };
   }

   validateForm() {
       return this.state.email.length > 0 && this.state.password.length>0;
   }

   handleChange = event => {
       this.setState({
               [event.target.id]: event.target.value
           });
   }

   handleSubmit = async event => {
       event.preventDefault();
       try {
           await Auth.signIn(this.state.email, this.state.password);
           this.props.userHasAuthenticated(true);
           this.props.history.push("/alljots");
       } catch (e) {
           alert(e.message);
       }
      
   }

   render() {
       return (
           <div className="Home">
               <div className="col-md-4">
                   <form onSubmit={this.handleSubmit}>
                       <FormGroup controlId="email">
                           <FormLabel>Email</FormLabel>
                           <FormControl
                               autoFocus
                               type="email"
                               value={this.state.email}
                               onChange={this.handleChange}
                           />
                       </FormGroup>
                       <FormGroup controlId="password" >
                           <FormLabel>Password</FormLabel>
                           <FormControl
                               value={this.state.password}
                               onChange={this.handleChange}
                               type="password"
                           />
                       </FormGroup>
                       <Button type="submit">
                           Login
                       </Button>
                   </form>
               </div>
           </div>
       );
   }
}

Securing API with Cognito:

Earlier in part1, we have created an API called jotter, now go to Authorizers section and create new Authorizer.

Once created, let us go to each resource and add authorization.

Do this for all the resources in the API.

API Gateway integration:

AllJotsList.js

import React from 'react';
import {Button} from 'react-bootstrap';
import {API, Auth} from 'aws-amplify';
import { Link } from "react-router-dom";

export default class AllJotsList extends React.Component{
   constructor(props) {
       super(props);
       this.state = {
           jotlist: []
       }
   };
   handleDelete = async (index) => {
       let djot = this.state.jotlist[index];
       let sessionObject = await Auth.currentSession();
       let idToken = sessionObject.idToken.jwtToken;
       let userid = sessionObject.idToken.payload.sub;
       try {
           const status = await this.DeleteNote(djot.jotid,userid,idToken);
           const jotlist = await this.jots(userid,idToken);
           this.setState({ jotlist });
       } catch (e) {
           alert(e);
       }
   }

   DeleteNote(jotid,userid,idToken) {
       let path = "jot?jotid="+jotid+"&userid="+userid;
       let myInit = {
           headers: { Authorization: idToken }
       }
       return API.del("jotter", path,myInit);
   }

   handleEdit = async (index) => {
       let jotlist = this.state.jotlist;
       let sessionObject = await Auth.currentSession();
       let idToken = sessionObject.idToken.jwtToken;
       let userid = sessionObject.idToken.payload.sub;
       let path = "/view?jotid="+jotlist[index].jotid+"&userid="+userid;
       this.props.history.push(path);
   }

   async componentDidMount() {
       if (!this.props.isAuthenticated) {
           return;
       }
       try {
           let sessionObject = await Auth.currentSession();
           let idToken = sessionObject.idToken.jwtToken;
           let userid = sessionObject.idToken.payload.sub;
           const jotlist = await this.jots(userid,idToken);
           this.setState({ jotlist });
       } catch (e) {
           alert(e);
       }
   }

   jots(userid,idToken) {
       let path = "/alljots?userid="+userid;
       let myInit = {
           headers: { Authorization: idToken }
       }
       return API.get("jotter", path, myInit);
   }

   render(){
       if(this.state.jotlist == 0)
           return(<h2>Please <Link to="/newjot">add jots</Link> to view.</h2>)
       else
           return(
               <div className="col-md-6">
               <table style={{"marginTop":"2%"}} className="table table-hover">
                   <thead>
                       <tr>
                           <th scope="col">#</th>
                           {/* <th scope="col">Last modified</th> */}
                           <th scope="col">Title</th>
                           <th scope="col">Options</th>
                       </tr>
                   </thead>
                   <tbody>
                           {
                           this.state.jotlist.map((jot, index) => {
                           return(
                               <tr key={index}>
                                       <td>{index+1}</td>
                                       {/* <td>{jot.lastmodified}</td> */}
                                       <td>{jot.title}</td>
                                       <td>
                                               <Button variant="outline-info" size="sm"
                                                   type="button"
                                                   onClick={()=>this.handleEdit(index)}
                                                   >Edit
                                               </Button>
                                           {" | "}
                                           <Button variant="outline-danger" size="sm"
                                               type="button"
                                               onClick={()=>this.handleDelete(index)}
                                               >Delete
                                           </Button>
                                       </td>
                               </tr>
                           )})
                           }}
                       </tbody>

               </table>
               </div>   
           );
   }
}

What we are doing here:

  1. An async call to our alljots api to get list of jots available for a given user, using API.get method.
  2. Since we have secured our API gateway with cognito, we need to send the id token in the Authorization header. Once we logged in Amplify automatically sets the session. We simply have to call the Auth.currentSession() method to get the session object. This session object contains all the data including tokens we need.
  3. Once we receive a successful response, if the list is empty we display a message “Please add jots to view.” Otherwise we iterate the list and display them as table.
  4. Each record in the table has an edit, and delete options. Edit option redirects to Viewjot component while delete is handled here itself using API.del method. After a successful response, 1-3 steps are repeated.

In the process of testing, I have deleted and created new jots. Now, you should be seeing a list of all jots of a logged in user.

For Viewjot and CreateJot we are using CKEditor.
Viewjot.js

import React, {Fragment} from 'react';
import {Button} from 'react-bootstrap';
import CKEditor from '@ckeditor/ckeditor5-react';
import BalloonEditor from '@ckeditor/ckeditor5-build-balloon';
import { API, Auth } from "aws-amplify";
import queryString from "query-string";


export default class ViewJot extends React.Component {
   constructor(props, match) {
       super(props,match);
       this.state = {
           titledata: "",
           bodydata: "",
           userid: "",
           jotid: "",
           edit: false
       }
   };
   onTitleChange = (event, editor) => {
       const data = editor.getData();
       this.setState({
           titledata: data
       });
   }
   onBodyChange = (event, editor) => {
       const data = editor.getData();
       this.setState({
           bodydata: data
       });
   }
   validateJot = () =>  {
       return this.state.titledata.length > 0 && this.state.bodydata > 0;
   }
   editJot = () => {
       this.setState({edit:true})
   }
   saveJot = async event => {
       event.preventDefault();
       let note = {
           title: this.state.titledata,
           body: this.state.bodydata,
           userid: this.state.userid,
           jotid: this.state.jotid
       };
       try {
           let sessionObject = await Auth.currentSession();
           let idToken = sessionObject.idToken.jwtToken;
           await this.saveNote(note,idToken);
           this.props.history.push("/alljots");
       }catch (e) {
           alert(e);
       }
   }
   saveNote(note, idToken) {
       let myInit = {
           headers: { Authorization: idToken},
           body: note
       }
       return API.post("jotter", "/newjot", myInit)
   }

   async componentDidMount(){
       try{
           let sessionObject = await Auth.currentSession();
           let idToken = sessionObject.idToken.jwtToken;
           let jot = await this.getNote(idToken);
           this.setState({
               titledata: jot.title,
               bodydata: jot.body,
               userid: jot.userid,
               jotid: jot.jotid
           });
       }catch(e){
           alert(e);
       }
   }

   getNote(idToken) {
       let params = queryString.parse(this.props.location.search);
       let path = "jot?jotid="+params.jotid+"&userid="+params.userid;
       let myInit = {
           headers: { Authorization: idToken }
       }
       return API.get("jotter", path,myInit);
   }
   render() {
       return (
           <Fragment>
               {!this.state.edit &&
               <div style={{marginTop: "1%", marginLeft: "5%"}}>
                   <Button
                       onClick={this.editJot}
                   >
                       Edit
                   </Button>
               </div>}
               {this.state.edit &&
               <div style={{marginTop: "1%", marginLeft: "5%"}}>
                   <Button
                       onClick={this.saveJot}
                   >
                       Save
                   </Button>
               </div>}
               <div style={{marginTop: "1%", marginLeft: "5%", width: "60%",border: "1px solid #cccc", marginBottom: "1%"}}>
                   <CKEditor
                       data={this.state.titledata}
                       editor={BalloonEditor}
                       disabled={!this.state.edit}
                       onChange={(event, editor) => {
                           this.onTitleChange(event, editor);
                   }}/>
               </div>
               <div style={{marginLeft: "5%", width: "60%",border: "1px solid #cccc"}}>
                   <CKEditor
                       data={this.state.bodydata}
                       editor={BalloonEditor}
                       disabled={!this.state.edit}
                       onChange={(event, editor) => {
                           this.onBodyChange(event, editor);
                   }}/>
               </div>          
           </Fragment>
       );
   }
}

What we are doing here is:

  1. As we redirect from all jots table on edit, we send the jot id as path parameter and use this to get data from API using API.get method.
  2. We have 2 modes built using state. We then use CKEditor in disable mode to only view the jot.
  3. Edit mode to edit jot data and a post request to update the data using API.post method.
  4. Same thing with CreateJot but only edit mode.

CreateJot.js

import React, {Fragment} from 'react';
import {Button} from 'react-bootstrap';
import CKEditor from '@ckeditor/ckeditor5-react';
import BalloonEditor from '@ckeditor/ckeditor5-build-balloon';
import { API,Auth } from "aws-amplify";
const titlecontent = "Title..";
const bodycontent = "Start writing notes..";

export default class CreateJot extends React.Component {
   constructor(props) {
       super(props);
       this.state = {
           titledata: titlecontent,
           bodydata: bodycontent,
       }
   };
   onTitleChange = (event, editor) => {
       const data = editor.getData();
       this.setState({
           titledata: data
       });
   }
   onBodyChange = (event, editor) => {
       const data = editor.getData();
       this.setState({
           bodydata: data
       });
   }
   validateJot = () =>  {
       return this.state.titledata.length > 0 && this.state.bodydata > 0;
   }
   createNew = async event => {
       event.preventDefault();
       let sessionObject = await Auth.currentSession();
       let idToken = sessionObject.idToken.jwtToken;
       let userid = sessionObject.idToken.payload.sub;
       let note = {
           title: this.state.titledata,
           body: this.state.bodydata,
           userid: userid,
           jotid: this.createJotid()
       };
       try {
           await this.createNote(note,idToken);
           this.props.history.push("/");
       }catch (e) {
           alert(e);
       }
   }
   createNote(note,idToken) {
       let myInit = {
           headers: { Authorization: idToken },
           body: note
       }
       return API.post("jotter", "/newjot", myInit);
   }
   createJotid(){
       let dt = new Date().getTime();
       let uuid = 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {
           let r = (dt + Math.random()*16)%16 | 0;
           dt = Math.floor(dt/16);
           return (c=='x' ? r :(r&0x3|0x8)).toString(16);
       });
       return uuid;
   }
   render() {
       return (
           <Fragment>
               <div style={{marginTop: "1%", marginLeft: "5%"}}>
                   <Button
                       onClick={this.createNew}
                   >
                       Create
                   </Button>
               </div>
               <div style={{marginTop: "1%", marginLeft: "5%", width: "60%",border: "1px solid #cccc", marginBottom: "1%"}}>
                   <CKEditor
                       data={titlecontent}
                       editor={BalloonEditor}
                       onChange={(event, editor) => {
                           this.onTitleChange(event, editor);
                   }}/>
               </div>
               <div style={{marginLeft: "5%", width: "60%",border: "1px solid #cccc"}}>
                   <CKEditor
                       data={bodycontent}
                       editor={BalloonEditor}
                       onChange={(event, editor) => {
                           this.onBodyChange(event, editor);
                   }}/>
               </div>          
           </Fragment>
       );
   }
}

Deployment on S3 cloud storage

Manual Deployment:
Create a public S3 bucket. Then build react application using the command:

npm run build

This creates a folder named ‘build’ in the project directory with all the static files. Upload these files into the S3 bucket you created earlier.

After the upload is complete. Goto S3 bucket properties and enable Static website hosting. This will give you a URL endpoint to access the static files from browser. Now your web app is hosted on S3, and can be visited using the URL endpoint. You could also map a custom domain of yours to this endpoint. This way when users visit this custom domain, internally they are redirected to this endpoint.

Open Chrome or Firefox, and visit the endpoint. You should see your app running. Now your application is open to the world.

Auto Deployment:
What about code updates? Should i keep uploading static files to S3 manually again and again? No, you don’t have to. You could either build a sophisticated CI/CD pipeline which uses webhooks like Jenkins that pull code from Git events or automate the manual build and deploy to S3 process using react scripts with aws cli.

Thanks for the read, I hope it was helpful.

This story is authored by Koushik. He is a software engineer and a keen data science and machine learning enthusiast.

Serverless Web Application Architecture using React with Amplify: Part1

In this series we shall be building Jotter (a note taking app). Jotter is a serverless web application built with React and Amplify using AWS cloud services. The illustration below depicts the serverless architecture we are going to build.

Serverless Web Application Architecture using React with Amplify

After deployment:

Serverless React Web Application Hosted on S3

Understanding Serverless:

Just put your code on cloud and run it. Without worrying about infrastructure.

Servers are still there in serverless, but managed by service provider. In our case, the service provider is AWS. Let us learn a little about AWS & its services. Feel free to skip, if you are acquainted with AWS know-how.

Understanding AWS:

Amazon Web Services is a cloud services platform that offers various cloud services. You could build reliable, scalable applications without worrying about managing infrastructure. Amazon Web Services offers a broad set of global cloud-based products including compute, storage, databases, analytics, networking, mobile, developer tools, management tools, IoT, security and enterprise applications. These services help organizations move faster, lower IT costs, and scale.

In a nutshell, AWS is like playing Lego or Minecraft. In lego, you use blocks to create different structures right? We neither create nor maintain those blocks. All we do is join & disjoin blocks logically to construct structures. Similarly, in AWS each service is like a block, we compose these services to develop our serverless application. You could also build a server based web application using AWS EC2 instance configured with AWS Route53, but that’s not the scope of this series.

Advantages:

  1. Granular control over each block. Testing becomes easy, as if a block is malfunctioning, we only have to replace/fix that one block.
  2. Reusability: Certain individual blocks or groups of blocks can be reused for other structures.
  3. We only pay for the usage.
  4. Scalable, high availability and fault tolerance.

That’s broadly about AWS as a platform. We shall first understand these services and then build the serverless web application architecture we viewed earlier.

Services used in the architecture:

  1. DynamoDB (NoSQL Database)
  2. Lambda (Server side computing)
  3. API Gateway (HTTP calls interface)
  4. Cognito (Authentication)
  5. S3 (Cloud storage)
  6. Amplify and React (Client side)

Understanding DynamoDB:

DynamoDB is a distributed NoSQL, schemaless, key-value storage system. Extremely scalable as the amount of data stored mainly depends on the physical memory of the system. While in DynamoDB you don’t have any such limits as you can scale the system horizontally. You will pay only for the resources you provision.

Though it is schemaless it is still represented as a table. Each table is a collection of items. Value(Attributes) of each item can be a scalar, JSON, set. Item size should be less than 400KB (binary, UTF-8). Each item in the table is uniquely identified with a Primary key and is mandatory while creating the table. Primary key can be the same as Partition key or a combination of Partition key and Sort key. If it is a combination of both it is also called Composite primary key.

Note: Primary key cannot be modified once created. Partition key and Sort key values are internally used as input for a hash function to determine storage.

More on indexes and DynamoDB, Parallel scan here.

Understanding Lambda:

Lambda is a serverless compute service. It lets you run code without provisioning or managing servers. You pay only for the compute time you consume. It supports NodeJS, Python, Java, GO etc. Lambda can be triggered from a variety of AWS services. Learn more about Lambda here.

To Every Lambda function handler, 3 objects can passed as argument. 

  1. The first argument is the event object, which contains information from the invoker. The invoker passes this information as a JSON-formatted string when it invokes Lambda. When an AWS service invokes your function, the event object structure varies by service.
  2. The second argument is the context object, which contains information about the invocation, function, and execution environment. In the preceding example, the function gets the name of the log stream from the context object and returns it to the invoker.
  3. The third argument, callback, is a function that you can call in non-async functions to send a response to invoker. The callback function takes two arguments: an Error and a response. The response object must be compatible with JSON.stringify. Error should be null for successful response.

In our app, Lambda is used as a mediator for incoming HTTP requests & DynamoDB. Lambda writes, reads and processes data to/from DynamoDB accordingly.

Understanding API Gateway:

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. 

With a few clicks in the AWS Management Console, you can create REST and WebSocket APIs that act as a front door for applications to access data, business logic, or functionality from your backend services, such as workloads running on EC2, code running on Lambda, any web application, or real-time communication applications.

API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.

In our app, we use API Gateway to invoke different Lambda functions for different API calls.

Understanding Cognito:

Amazon Cognito User Pool makes it easy for developers to add sign-up and sign-in functionality to web and mobile applications. It serves as your own identity provider to maintain a user directory. It supports user registration and sign-in, as well as provisioning identity tokens for signed-in users.

Our Jotter app needs to handle user accounts and authentication in a secure and reliable way. We are going to use Cognito User Pool for it.

Understanding Amplify:

AWS Amplify is a framework provided by AWS to develop applications, with AWS cloud services. Amplify makes the process of stitching cloud services with our application hassle free. Amplify provides different libraries for different apps(iOS, Android, Web, React Native). Amplify javascript library is available as an npm package(aws-amplify). The aws-amplify client library uses a config file to connect AWS services. The services which amplify provides include Database, API, Lambda/serverless, Authentication, Hosting, Storage, Analytics. 

Note: One could also use AWS Amplify CLI to provision AWS services. The aws-amplify client library and Amplify CLI are two different things.

Amplify CLI internally uses cloudformation to provision/create, while aws-amplify client library is used to connect to AWS services. Using Amplify CLI is inconvenient as you are not creating services directly from AWS console but by using a CloudFormation stack internally. If successful, it returns a config file with all the metadata of different services provisioned. Instead a simple approach is to create required services from AWS console and update the config file manually and use it with aws-amplify client library.

In our Jotter app, we will use aws-amplify client javascript library to interact with AWS services.

Building the serverless architecture

API Gateway configured with Lamba to read and write data from DynamoDB.

Let us first build this setup and then add the remaining.

Working with DynamoDB:

Create table:

Go to AWS console > DynamoDB > Tables, choose Create table. As we learned earlier each table is a collection of items and each item is identified with a primary key.

So, give the table a name, set the schema for primary key. Each jot is uniquely identified with the combination of userid (partition key) and jotid (sort key).

Click on Create.

This will create an empty table with 2 columns namely userid, jotid.

Adding sample data:

Each jot item is a json object with the following structure: 

{  
   "userid":"<32 digit uuid code>",
   "jotid":"<32 digit uuid code>",
   "title":"title content",
   "body":"body content"
}

Initially you will be shown only 2 keys userid, jotid as they are part of the primary key. We shall add the remaining 2 attributes (title, body). I have generated 5 random uuid codes for populating the table. We shall be adding 4 items, all of them belong to the same user which means they will have the same userid but different jotid. Click on Create item.

Similarly, add remaining 3 to the table.

We have successfully created a table and added 4 items to the table, now let us create Lambda functions to process data from DynamoDB.

Working with Lambda:

As we have learned earlier, Lambda lets us run code without provisioning or managing servers. We shall use Lambda for our server side computing or business logic in simple terms.

Creating a Lambda Function:
Let us first understand our server side operations that Jotter app needs.

  • Read a list of all items of a user from DynamoDB table.
  • Write a new item to DynamoDB table by a user.
  • Read the full content of an item of a user.
  • Delete an item of a user.

We shall create 4 Lambda functions that will do each of these jobs/operations.

Get all items Lambda:

Every Lambda has to be configured with an IAM role. This IAM role defines access control to other AWS services. We shall create an IAM role for Lambda to access DynamoDB.

Go to IAM > Roles. Choose create role. After role is created, we could attach policies or rules in simple terms. We could either attach default policies provided by AWS or create custom policies tailored to our specific needs.

Custom policy:
Choose service, and then operations. Create policy.

Create IAM policy for DynamoDB write access

This way you could add custom policies to the role or use default AWS policies. I am giving AmazonDynamoDBFullAccess policy as i will be using this role for all my Lambda functions. A better approach would be creating different roles with custom policies for specific operations. Please do exercise all options. Do not give permissions more than needed.

Once role is created, go back to Lambda console, create the Lambda function with that role.

  • Choose Author from scratch
  • Give function a name(get-all-jots)
  • Choose runtime environment(Nodejs 10x)
  • Permissions, choose existing role and add the role created earlier.
  • Create function.

After creation, replace index.js code with below:
In the Lambda code, I used the javascript aws-sdk to connect to DynamoDB.

get-all-jots Lambda:

var AWS = require('aws-sdk');
var ddb = new AWS.DynamoDB({region: 'your table region', apiVersion: '2012-08-10'});
exports.handler = (event,context,callback) => {
    console.log(event.userid);
    // TODO implement
   var params = {
  ExpressionAttributeNames: {
   "#T": "title",
   "#JI": "jotid"
  }, 
  ExpressionAttributeValues: {
   ":id": {
     S: ""+event.userid
    }
  }, 
  FilterExpression: "userid = :id", 
  ProjectionExpression: "#JI,#T", 
  TableName: "jotter"
 };
 ddb.scan(params, function(err, data) {
   if (err){
        console.log(err, err.stack); // an error occurred
        callback(err)
   } 
   else{
        console.log(data);
        let jots = [];
        data.Items.map((item) => {
         let jot = {
          title: item.title.S,
          jotid: item.jotid.S
         }
        jots.push(jot)
        })
        console.log("helloworld",jots);
        callback(null,jots);
   }    
 });
};

Testing get-all-jots Lambda function:
Choose to configure test events beside Test button. Here we configure the event object that is passed as parameter to Lambda function.

Save it, and test the Lambda function. You should see DynamoDB data in the response.

If you don’t see the results, enjoy debugging the logs in CloudWatch. You could navigate to logs from monitoring tab in Lambda console.

Similarly create the remaining 3 Lambda functions and test them by configuring event objects.

create-new-jot Lambda:

var AWS = require('aws-sdk');
var ddb = new AWS.DynamoDB({region: 'your table region', apiVersion: '2012-08-10'});

exports.handler = (event,context, callback) => {
    var params = {
      TableName: 'jotter',
      Item: {
        'userid' : {S: ''+event.userid},
        'title' : {S: ''+event.title},
        'body':{S: ''+event.body},
        'jotid':{S: ''+event.jotid}
      }
    };

ddb.putItem(params, function(err, data) {
  if (err) {
    console.log("Error", err);
    callback(err);
  } 
  else {
    console.log("Success", data);
    callback(null, data);
  }
});
};

Event object for test configuration: (create-new-jot)

{  
   "userid":"<32 digit uuid code>",
   "jotid":"<32 digit uuid code>",
   "title":"title content",
   "body":"body content"
}

get-jot Lambda:

var AWS = require('aws-sdk');
var ddb = new AWS.DynamoDB({region: 'your table region', apiVersion: '2012-08-10'});

exports.handler = (event, context,callback) => {
 let jotid = event.jotid;
 let userid = event.userid;
 
 var params = {
  Key: {
   "jotid": {S: ""+jotid},
   "userid": {S: ""+userid}
  }, 
  TableName: "jotter"
 };
 ddb.getItem(params, function(err, data) {
   if (err){
    console.log(err);
    callback(err)   
   }
   else{
    console.log(data);
     let jot = { 
      "userid": data.Item.userid.S,
      "title": data.Item.title.S,
      "body": data.Item.body.S,
      "jotid": data.Item.jotid.S
    }
    console.log(data);
    callback(null,jot)
   }
 });
};

Event object for test configuration: (get-jot)

{  
   "userid":"<32 digit uuid code>",
   "jotid":"<32 digit uuid code>",
}

delete-jot Lambda:

var AWS = require('aws-sdk');
var ddb = new AWS.DynamoDB({region: 'your table region', apiVersion: '2012-08-10'});

exports.handler = (event, context,callback) => {
 let jotid = event.jotid;
 let userid = event.userid;
 
 var params = {
  Key: {
   "jotid": {S: ""+jotid},
   "userid": {S: ""+userid}
  }, 
  TableName: "jotter"
 };
 
 ddb.deleteItem(params, function(err, data) {
  if (err) {
    console.log(err, err.stack);
    callback(err)
  }
  else {
    console.log(data);
    callback(null,data);
  }
});
};

Event object for test configuration: (delete-jot)

{  
   "userid":"<32 digit uuid code>",
   "jotid":"<32 digit uuid code>",
}

Test all the above Lambda functions, and see if the data you get in the response from callback is correct. For new jot check if the data is updated in the DynamoDB table. If not enjoy debugging the logs. We shall now see how we can invoke these functions from API gateway for each API/http-method call.

Working with API Gateway:

As we learned earlier, it makes it easy for developers to create, publish, manage, and secure APIs at any scale. It acts as a front door to backend services.

Before we start building let us understand our requirements and design our API accordingly.

  • Get a list of all jots of a user.
  • Post a new jot by a user.
  • Get the full content of a jot of a user.
  • Delete a jot of a user.

Accordingly our API request URIs & http methods could be something like this.

  • GET <URL endpoint>/alljots?userid=<id>
  • POST <URL endpoint>/newjot with jot content in body
  • GET <URL endpoint>/jot?userid=<id>&jotid=<id>
  • DELETE <URL endpoint>/jot?userid=<id>&jotid=<id>

Go to console > API Gateway > APIs > Create API

After creating the API, we could add any no. of resources(paths) to it. Let us add first resource.

Enable API Gateway CORS as it allows requests from different origins. Let us add the get method to it.

After adding the method, select the integration type to be Lambda function, fill in Lambda function details like region and name. When this URI is called this Lambda is invoked.

After saving, select GET, it opens method execution.

Understanding Method Execution:

  1. Client requests are passed to Method Request.
  2. Method Request acts like a gatekeeper it authenticates request, if not it rejects the request with 401 status code. Then it passes to Integration Request.
  3. Integration Request controls what goes into action(Lambda in our case). Data mapping & shaping can be done here. 
  4. Call the action with request data as params. 
  5. When action calls back with response, Integration Response handles it.
  6. Integration Response controls what comes out of action. It can also be used to map data to a required structure.
  7. Method Response is used to define Integration Response. 
  8. Method Response responds to the Client.

Adding a Data mapping template to Integration Request:

What mapping templates does is it maps data from request body or params into an object that is passed as event object to action (Lambda function). We do this only when we have to extract and get data into some specific structure our action needs.

Select Integration Request, open Mapping Templates.

After adding Data mapping template to Integration Request, let us test the API.

Testing the API:

Click on TEST in the Method Execution section. Enter the QueryStrings params as:

userid=”uuid”

Click on Test. You should be seeing all the jots of that userid in the Response Body.

Deploy API to a stage:

You could create different deployment stages, like dev, prod, testing etc. Everything we create or modify, for them to take effect, API has to be deployed to a stage. This gives us a URL endpoint, we could use to make API calls. For every change made, we have to deploy or the changes will be local to API Gateway and don’t reflect on the URL endpoint. Each stage has different URL endpoint with different resources.

Now we have successfully created an API with alljots path/resource.

Testing API using postman:

Launch postman, enter the URL, choose HTTP method, add appropriate headers and body if needed. Send the request.

Tada! API is working. Similarly add the remaining 3 URI’s in API Gateway and test them using postman.

Body mapping template for Delete jot and Get jot URI resources is the same, but we pass 2 arguments(userid, jotid) in query string.

Note: The reason why we passed data in query strings for get & delete methods is that these methods do not contain request body, the only means of passing data is by using path parameters or query strings. 

Data mapping template for Delete and Get jot URI resources:

Example for Get: (similarly for Delete)

For newjot post request, data has to be passed in the body in this format:

{  
   "userid":"<32 digit uuid code>",
   "jotid":"<32 digit uuid code>",
   "title":"title content",
   "body":"body content"
}

Example for Post:

Note: And again if things didn’t work properly, There can be many possibilities of why something isn’t working as we are dealing with 3 services here, my answer is start debugging the CloudWatch logs. 

We have successfully built and tested the AWS services part in our serverless architecture. We will look at how to integrate it with a react application, create Sign-in, Sign-up using Cognito and how to secure our API’s in API Gateway with Cognito and finally deploy the app on S3 cloud storage in Part2.

This story is authored by Koushik. He is a software engineer and a keen data science and machine learning enthusiast.