Real Time Face Identification on Live Camera Feed using Amazon Rekognition Video and Kinesis Video Streams

In this post, we are going to learn how to perform facial analysis on live feed by setting up a serverless video analytics architecture using Amazon Rekognition Video and Amazon Kinesis Video Streams.

Use cases:

  1. Intruder notification system
  2. Employee sign-in system

Architecture overview

Real time face recognition using AWS on a live video stream

We shall learn how to use the webcam of a laptop (we can, of course, use professional grade cameras and hook it up with Kinesis Video streams for a production ready system) to send a live video feed to the  Amazon Kinesis Video Stream. The stream processor in Amazon Rekognition Video picks up this webcam feed and analyses by comparing this feed with the faces in the face collection created beforehand. This analysis is written to the Amazon Kinesis Data Stream. For each record written to the Kinesis data stream, the lambda function is invoked. This lambda reads the record from kinesis stream data. If there are any facial matches or mismatches, depending upon how the lambda is configured an email notification is sent via Amazon SNS (Simple Notification Service) to the registered email addresses.

Note: Make sure all the above AWS resources are created in the same region.

Before going further, make sure you already have the AWS CLI configured on your machine. If not, follow this link –  https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html

Step-1: Create Kinesis Video Stream

We need to create the Kinesis Video Stream for our webcam to connect and send the feed to Kinesis Video Streams from AWS console. Create a new one by clicking the Create button. Give the stream a name in stream configuration.

We have successfully created a Kinesis video stream.

Step-2: Send Live Feed to the Kinesis Video Stream

We need a video producer, anything that sends media data to the Kinesis video stream is called a producer. AWS currently provides producer library primarily supported in these four languages(JAVA, Android, C++, C) only. We use the C++ Producer Library as a GStreamer plugin.

To easily send media from a variety of devices on a variety of operating systems, this tutorial uses GStreamer, an open-source media framework that standardizes access to cameras and other media sources.

Download the Amazon Kinesis Video Streams Producer SDK from Github using the following Git command:

git clone https://github.com/awslabs/amazon-kinesis-video-streams-producer-sdk-cpp 

After downloading successfully you can compile and install the GStreamer sample in the kinesis-video-native-build directory using the following commands:

For Ubuntu – run the following commands

sudo apt-get update

sudo apt-get install libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev gstreamer1.0-plugins-base-apps

sudo apt-get install gstreamer1.0-plugins-bad gstreamer1.0-plugins-good gstreamer1.0-plugins-ugly gstreamer1.0-tools

For Windows – run the following commands

Inside mingw32 or mingw64 shell, go to kinesis-video-native-build directory and run ./min-install-script 

For macOS – run the following commands

Install homebrew

Run brew install pkg-config openssl cmake gstreamer gst-plugins-base gst-plugins-good gst-plugins-bad gst-plugins-ugly log4cplus

Go to kinesis-video-native-build directory and run ./min-install-script

Running the GStreamer webcam sample application:
The sample application kinesis_video_gstreamer_sample_app in the kinesis-video-native-build directory uses GStreamer pipeline to get video data from the camera. Launch it with the kinesis video stream name created in Step-1 and it will start streaming from the camera.

AWS_ACCESS_KEY_ID=<YourAccessKeyId> AWS_SECRET_ACCESS_KEY=<YourSecretAccessKey> ./kinesis_video_gstreamer_sample_app <stream_name>

After running the above command, the streaming would start and you can see the live feed in Media preview of Kinesis Video Stream created in the Step-1.

Step-3: Creating Resources using AWS CloudFormation Stack

The CloudFormation stack will create the resources that are highlighted in the following image.

Highlighted resources to be created using CloudFormation
  1. The following link will automatically open a new CloudFormation Stack in us-west-2: CloudFormation Stack.
  2. Go to View in Designer and edit the template code and click on create stack.
    1. Change the name of the application from Default.
    2. Change the Nodejs version of the RekognitionVideoLambda to 10.x (Version 6.10 isn’t supported anymore) and then click on create stack.
editing CloudFormation stack template

Enter your Email Address to receive notifications and then choose Next as shown below:

Skip the Configure stack options step by choosing Next. In the final step you must check the checkbox and then choose Create stack as in the below image:

Once the stack created successfully you can see it in your stacks as below with status CREATE_COMPLETE. This creates the required resources.

Once completed, you will receive a confirmation email to subscribe/receive notifications. Make sure you subscribed to that.

snapshot from subscribe confirmation email

Step-4: Add face to a Collection

As learned earlier the Stream Processor in Amazon Rekognition Video picks up and analyzes the feed coming from Kinesis Video Stream by comparing it with the faces in the face collection. Let’s create this collection.

For more information follow the link – https://docs.aws.amazon.com/rekognition/latest/dg/create-collection-procedure.html

Create Collection: Create a face collection using the AWS CLI on the command line with below command:

aws rekognition create-collection --collection-id <Collection_Name> --region us-west-2

Add face(s) to the collection: You can use your picture for testing.
First we need to upload the image(s) to an Amazon S3 bucket in the us-west-2 region. Run the below command by replacing BUCKET_NAME, FILE_NAME with your details and you can give any name for FACE-TAG ( can be name of the person).

aws rekognition index-faces --image '{"S3Object":{"Bucket":"<BUCKET_NAME>","Name":"<FILE_NAME>.jpeg"}}' --collection-id "rekVideoBlog" --detection-attributes "ALL" --external-image-id "<FACE-TAG>" --region us-west-2

After that you will receive a response as output as shown below

Now we all ready to go further. If you have multiple photos run the above command again with the new file name.

Step-5: Creating the Stream Processor

We need to create a stream processor which does all the work of reading video stream, facial analysis against face collection and finally writing the analysis data to Kinesis data stream. It contains information about the Kinesis data stream, Kinesis video stream, Face collection ID and the role that is used by Rekognition to access Kinesis Video Stream.

Copy meta info required:
Go to your Kinesis video stream created in step-1, note down the Stream ARN (KINESIS_VIDEO_STREAM_ARN) from the stream info as shown below:

Next from the CloudFormation stack output note down the values of KinesisDataStreamArn (KINESIS_DATA_STREAM_ARN) and RecognitionVideoIAM (IAM_ROLE_ARN) as shown below:

Now create a JSON file in your system that contains the following information:

{
       "Name": "streamProcessorForRekognitionVideoBlog",
       "Input": {
              "KinesisVideoStream": {
                     "Arn": "<KINESIS_VIDEO_STREAM_ARN>"
              }
       },
       "Output": {
              "KinesisDataStream": {
                     "Arn": "<KINESIS_DATA_STREAM_ARN>"
              }
       },
       "RoleArn": "<IAM_ROLE_ARN>",
       "Settings": {
              "FaceSearch": {
                     "CollectionId": "COLLECTION_NAME",
                     "FaceMatchThreshold": 85.5
              }
       }
}

Create the stream processor with AWS CLI from command line with following command:

aws rekognition create-stream-processor --region us-west-2 --cli-input-json file://<PATH_TO_JSON_FILE_ABOVE>

Now start the stream processor with following command:

aws rekognition start-stream-processor --name streamProcessorForRekognitionVideoBlog --region us-west-2

You can see if the stream processor is in a running state with following command:

aws rekognition list-stream-processors --region us-west-2

If it’s running, you will see below response:

And also make sure your camera is streaming in command line and feed is being received by the Kinesis video stream.

Now you would get notified whenever known or an unknown person shows up in your webcam.

Example notification:

Note: 

  1. Lambda can be configured to send email notification only if unknown faces are detected or vice versa.
  2. For cost optimization, one could use a python script to call Amazon Rekognition service with a snapshot only when a person is detected instead of wasting resources in uninhabited/unpeopled area.

Step-6: Cleaning Up Once Done

Stop the stream processor.

aws rekognition stop-stream-processor --name streamProcessorForRekognitionVideoBlog --region us-west-2

Delete the stream processor.

aws rekognition delete-stream-processor --name streamProcessorForRekognitionVideoBlog --region us-west-2

Delete the Kinesis Video Stream. Go to the Kinesis video streams from AWS console and select your stream and Delete.

Delete the CloudFormation stack. Go to the CloudFormation, then select stacks from AWS console and select your stack and Delete.

Thanks for the read! I hope it was both fun and useful.

This story is co-authored by Venu Vaka and Koushik Busim. Venu is a software engineer and machine learning enthusiast. Koushik is a software engineer and a keen data science and machine learning enthusiast.

Understanding React Native Navigation

This post focuses on building a react-native app which supports navigation using react-native-navigation.

Why React Native Navigation?

There are many ways in which we can implement navigation functionality in mobile apps developed using React Native. The two most popular among them are
‘React Navigation’ and ‘React Native Navigation’.

As React Native Navigation uses the native modules with a JS bridge, the performance will be better when compared to React Navigation.

In the newer versions of react-native it is not necessary to link the libraries manually. The auto linking feature was introduced in the react-native version 0.6. The newer versions of react-native ( above 0.60 ) use CocoaPods by default.

Hence it is not necessary to link the libraries in Xcode. The libraries will be linked automatically when we install the libraries using the ‘pod install’ command. Over the course of this post, we will be building a simple app for iOS in react-native (version above 0.60) which supports navigation using ‘react-native-navigation’.

Create a react-native app:

To set up react-native environment in your machine refer to the link below.

https://facebook.github.io/react-native/docs/getting-started.html

Create a react-native app using the following command.

react-native init sample-app

It will create a project folder with the name sample-app. Then run the following commands to install the dependencies.

cd sample-app
npm install

After that run the following command inside the project directory to open the app in an iPhone Simulator.

react-native run-ios

You should see the app running in an iOS simulator in a while. And it will display a welcome screen if you haven’t made any code changes in App.js file.

Steps to install React-Native-Navigation:

To install react-native-navigation, run the following command inside the project directory.

npm install react-native-navigation --save

It will install the react-native-navigation package into the project. Add the following line to the Podfile.

pod 'ReactNativeNavigation', :podspec => '../node_modules/react-native-navigation/ReactNativeNavigation.podspec'

Then move to the ios folder in the project and install the dependencies using the following commands.

cd ios
pod install

Pod install command will install the SDKs specified in the Podspec, along with any dependencies they may have.

After that we need to edit AppDelegate.m file in Xcode.
To do so, open the navigation.xcworkspace file located in sample-app/ios folder. Remove the content inside the AppDelegate.m file and add the following code into the file.

File : AppDelegate.m

#import "AppDelegate.h"

#import <React/RCTBundleURLProvider.h>
#import <React/RCTRootView.h>
#import <ReactNativeNavigation/ReactNativeNavigation.h>

@implementation AppDelegate

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
  NSURL *jsCodeLocation = [[RCTBundleURLProvider sharedSettings] jsBundleURLForBundleRoot:@"index" fallbackResource:nil];
  [ReactNativeNavigation bootstrap:jsCodeLocation launchOptions:launchOptions];
  
  return YES;
}

@end

Navigation Workflow:

First we will create a landing/home screen for the app. Create a Landing Screen inside the project folder by following the below steps.

  1. Create a folder named src inside the project.
  2. Add another folder inside src and name it screens.
  3. Create Welcome.js file in screens folder and add the following code into it.

File : Welcome.js

import React, {Component} from 'react';
import {View, Text, StyleSheet, Button} from 'react-native';

class WelcomeScreen extends Component {
  render() {
    return (
      <View style={styles.container}>
        <Text style={styles.textStyles}>React Native Navigation</Text>
        <View style={styles.loginButtonContainer}>
          <Button
            title="Login"
            color="#FFFFFF"
          />
        </View>
        <View style={styles.signUpButtonContainer}>
          <Button
            title="Signup"
            color="#FFFFFF"
          />
        </View>
      </View>
    );
  }
}

export default WelcomeScreen;

const styles = StyleSheet.create({
  container: {
    flex: 1,
    alignItems: 'center',
    justifyContent: 'flex-start',
    backgroundColor: '#CCCCCC',
  },
  textStyles: {
    alignItems: 'flex-start',
    fontSize: 25,
    fontWeight: 'bold',
    marginTop: 200,
  },
  loginButtonContainer: {
    backgroundColor: '#2494fb',
    borderRadius: 10,
    marginTop: 40,
    width: '50%',
    shadowColor: '#000000',
  },
  signUpButtonContainer: {
    backgroundColor: '#2494fb',
    borderRadius: 10,
    marginTop: 10,
    width: '50%',
    shadowColor: '#000000',
  },
});

We need to register the screens for the navigator to work. For that we need to modify the contents of index.js file. Replace the contents of index.js file with the following code.

File : index.js

import { Navigation } from 'react-native-navigation';
import Welcome from './src/screens/Welcome'; 

Navigation.registerComponent('Welcome', ()=>Welcome)
Navigation.events().registerAppLaunchedListener(()=>{
    Navigation.setRoot({
        root:{
            component:{
                name:'Welcome'
            }
        }
    })
})

Explanation:

Navigation.registerComponent() is used to register the screen. As we have already created the Welcome screen component, we would use that to register.

Navigation.registerComponent('Welcome', () => Welcome).

The above step only registers our screen but not the application.So to register our application we would use Navigation.events().registerAppLaunchedListener().

Inside the callback of the registerAppLaunchedListener we will set the root of the application using Navigation.setRoot(). The code would be like the following.

Navigation.events().registerAppLaunchedListener(()=>{
    Navigation.setRoot({
        root:{
            component:{
                name:'Welcome'
            }
        }
    })
})

Here the component name is Welcome which is the screen we created and registered.

Note: The name of the component and the name of the registerComponent screen must be same.

Run the file from Xcode which you opened earlier using navigation.xcworkspace. And you will be seeing the landing screen which we created in an iPhone Simulator. It would look like below.

Now we have the landing page of the app, we shall now try to navigate to other screens when we click either on Login or SignUp button. To obtain that mechanism we will create two more screens inside the screens folder as Login.js and SignUp.js.

Place the below code snippet in Login.js file.

File : Login.js

import React, { Component } from "react";
import { View, Text, StyleSheet } from "react-native";

class Login extends Component {
    render() {
        return(
            <View style={styles.container}>
                <Text style={styles.textStyles}>
                    Welcome to Login Page.
                </Text>
            </View> 
        );
    }
}

export default Login;

const styles = StyleSheet.create({
    container: {
        flex:1,
        alignItems:'center',
        justifyContent: 'center',
        backgroundColor: '#CCCCCC',
    },
    textStyles: {
        alignItems:'center',
        fontSize: 25
    }
});

After that place the below code in SignUp.js

File : SignUp.js

import React, { Component } from "react";
import { View, Text, StyleSheet } from "react-native";

class SignUp extends Component {
    render() {
        return(
            <View style={styles.container}>
                <Text style={styles.textStyles}>
                    Welcome to SignUp Page.
                </Text>
            </View> 
        );
    }
}

export default SignUp;

const styles = StyleSheet.create({
    container: {
        flex:1,
        alignItems:'center',
        justifyContent: 'center',
        backgroundColor: '#CCCCCC',
    },
    textStyles: {
        alignItems:'center',
        fontSize: 25
    }
});

As we have to navigate between screens, we cannot set the root screen as only component. So to navigate we must use Stack (array of screens). The Stack consists of children array where we set the component name of the landing page of the app. Hence we also need to replace the contents of index.js file as below.

File : index.js

import { Navigation } from 'react-native-navigation';
import Welcome from './src/screens/Welcome';
import Login from './src/screens/Login';
import SignUp from './src/screens/SignUp';

Navigation.registerComponent('Welcome', ()=>Welcome)
Navigation.registerComponent('Login', () => Login)
Navigation.registerComponent('SignUp', () => SignUp)

Navigation.events().registerAppLaunchedListener(()=>{
    Navigation.setRoot({
        root:{
            stack:{
                id:'navigationStack',
                children: [
                        {
                            component :{
                                name: 'Welcome'
                            },
                        },
                ]
            }
        }
    })
})

To add the title to the landing page, replace the Navigator.setRoot({}) function in the index.js file as below.

Navigation.setRoot({
        root:{
            stack:{
                id:'navigationStack',
                children: [
                        {
                            component :{
                                name: 'Welcome',
                                options: {
                                    topBar: {
                                        title: {
                                            text: 'Home'
                                        }
                                    }
                                }
                            },
                        },
                ]
            }
        }
    })

The landing screen will now look like below which has HOME as its title.

Now we will write the event handler function in Welcome.js file so that we can navigate between the screens when we click on either the Login or SignUp button. For that replace the content of Welcome.js file as below.

File : Welcome.js

import React, {Component} from 'react';
import {View, Text, StyleSheet, Button} from 'react-native';
import {Navigation} from 'react-native-navigation';

class WelcomeScreen extends Component {
  goToScreen = screenName => {
    Navigation.push(this.props.componentId, {
      component: {
        name: screenName,
      },
    });
  };
  render() {
    return (
      <View style={styles.container}>
        <Text style={styles.textStyles}>React Native Navigation</Text>
        <View style={styles.loginButtonContainer}>
          <Button
            title="Login"
            color="#FFFFFF"
            onPress={() => this.goToScreen('Login')}
          />
        </View>
        <View style={styles.signUpButtonContainer}>
          <Button
            title="Signup"
            color="#FFFFFF"
            onPress={() => this.goToScreen('SignUp')}
          />
        </View>
      </View>
    );
  }
}

export default WelcomeScreen;

const styles = StyleSheet.create({
  container: {
    flex: 1,
    alignItems: 'center',
    justifyContent: 'flex-start',
    backgroundColor: '#CCCCCC',
  },
  textStyles: {
    alignItems: 'flex-start',
    fontSize: 25,
    fontWeight: 'bold',
    marginTop: 200,
  },
  loginButtonContainer: {
    backgroundColor: '#2494fb',
    borderRadius: 10,
    marginTop: 40,
    width: '50%',
    shadowColor: '#000000',
  },
  signUpButtonContainer: {
    backgroundColor: '#2494fb',
    borderRadius: 10,
    marginTop: 10,
    width: '50%',
    shadowColor: '#000000',
  },
});

To navigate to a particular screen we need to import Navigation component from react-native-navigation.When we click either on the Login button or SignUp button, the event is handled in goToScreen method.

We are passing the name of the screen as an argument to the method and it must be same as the name which is used to register the screen in Navigator.registerComponent() in index.js file.

In the goToScreen method we are using this.props.componentId, which is the Id of the current screen and is made automatically available by react-native-navigation.Inside Navigation.push function we assign the name of the screen to be navigated to ‘component’ object and it is the argument which we are passing to the method when we press either Login or SignUp button.

Then run the project using Xcode. The Welcome.js screen appears in a while. Then if we click on the Login button, the app would be navigated to the Login.js Page and it would be navigated to the SignUp.js page if we click on the SignUp button. React Native Navigation also adds the back button at the top using which we can go back to the screen we navigated from.

The Login and SignUp screens would look like below.

I hope, the article was helpful in understanding React Native Navigation. Thanks for the read.

This story is authored by Dheeraj Kumar. Dheeraj is a software engineer specializing in React Native and React based frontend development.

Setting Up a Data Lake on AWS Cloud Using LakeFormation

Setting up a Data Lake involves multiple steps such as collecting, cleansing, moving, and cataloging data, and then securely making that data available for downstream analytics and Machine Learning. AWS LakeFormation simplifies these processes and also automates certain processes like data ingestion. In this post, we shall be learning how to build a very simple data lake using LakeFormation with hypothetical retail sales data.

AWS Lake Formation provides its own permissions model that augments the AWS IAM permissions model. This centrally defined permissions model enables fine-grained access to data stored in data lake through a simple grant/revoke mechanism. These permissions are enforced at the table and column level on the data catalogue and are mapped to the underlying objects in S3. LakeFormation permissions are applicable across the full portfolio of AWS analytics and Machine Learning services, including Amazon Athena and Amazon Redshift.

So, let’s get on with the setup.

Adding an administrator

First and foremost step in using LakeFormation is to create an administrator. An administrator has full access to LakeFormation system and initial access to data configuration and access permissions. 

After adding an administrator, navigate to the Dashboard using the sidebar. This illustrates the typical process of Data lake setup.

Register location

From Register and Ingest sub menu in the sidebar, If you wish to setup data ingestion, that is, import unprocessed/landing data, AWS LakeFormation comes with in-house Blueprints that one could use to build Workflows. These workflows could be scheduled as per the needs of the end-user. Sources of data for these workflows can be a JDBC source, log files and many more. Learn more about importing data using workflows here.

If your ingestion process doesn’t involve any of the above mentioned ways and writes directly to S3, it’s alright. Either way we end up registering that S3 location as one of the Data Lake locations.

Once created you shall see its listing in the Data Lake locations.

You could not only access this location from here but also set permission to objects stored in that path. If preferred, one could register lake locations precisely for each processing zone and set permissions accordingly. I registered it to the whole bucket.

I created 2 retail datasets (.csv), one with 20 records and the other with 5 records. I have uploaded one of the datasets (20 records) to S3 with raw/retail_sales prefix.

Creating a Database

Lake Formation internally uses the Glue Data Catalog, so it shows all the databases available. From the Data Catalog sub menu in the sidebar, navigate to Databases to create and manage all the databases. I created a database called merchandise with default permissions.

Once created, you shall see its listing, and also manage, grant/revoke permissions and view tables in that DB.

Creating Crawlers and ETL jobs

From the Register and Ingest sub menu in the sidebar, navigate to Crawlers, Jobs to create and manage all Glue related services. Lake Formation redirects to AWS Glue and internally uses it. I created a crawler to get the metadata for objects residing in raw zone.

After running this crawler manually, now raw data can be queried from Athena.

I created an ETL job to run a transformation on this raw table data. 

All it does is change the class type of purchase date, which is from string class to date class. Creates partitions while writing to refined zone in parquet format. These partitions are created from the processing date but not the purchase date.

retail-raw-refined ETL job python script:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import datetime
from pyspark.sql.functions import *
from pyspark.sql.types import *
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql import *

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## @type: DataSource
## @args: [database = "merchandise", table_name = "raw_retail_sales", transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "merchandise", table_name = "raw_retail_sales", transformation_ctx = "datasource0")
## @type: ApplyMapping
## @args: [mapping = [("email_id", "string", "email_id", "string"), ("retailer_name", "string", "retailer_name", "string"), ("units_purchased", "long", "units_purchased", "long"), ("purchase_date", "string", "purchase_date", "date"), ("sale_id", "string", "sale_id", "string")], transformation_ctx = "applymapping1"]
## @return: applymapping1
## @inputs: [frame = datasource0]

#convert glue object to sparkDF
sparkDF = datasource0.toDF()
sparkDF = sparkDF.withColumn('purchase_date', unix_timestamp(sparkDF.purchase_date, 'dd/MM/yyyy').cast(TimestampType()))

applymapping1 = DynamicFrame.fromDF(sparkDF, glueContext,"datafields")
# applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("email_id", "string", "email_id", "string"), ("retailer_name", "string", "retailer_name", "string"), ("units_purchased", "long", "units_purchased", "long"), ("purchase_date", "string", "purchase_date", "date"), ("sale_id", "string", "sale_id", "string")], transformation_ctx = "applymapping1")
## @type: ResolveChoice
## @args: [choice = "make_struct", transformation_ctx = "resolvechoice2"]
## @return: resolvechoice2
## @inputs: [frame = applymapping1]
resolvechoice2 = ResolveChoice.apply(frame = applymapping1, choice = "make_struct", transformation_ctx = "resolvechoice2")
## @type: DropNullFields
## @args: [transformation_ctx = "dropnullfields3"]
## @return: dropnullfields3
## @inputs: [frame = resolvechoice2]
dropnullfields3 = DropNullFields.apply(frame = resolvechoice2, transformation_ctx = "dropnullfields3")
## @type: DataSink
## @args: [connection_type = "s3", connection_options = {"path": "s3://test-787/refined/retail_sales"}, format = "parquet", transformation_ctx = "datasink4"]
## @return: datasink4
## @inputs: [frame = dropnullfields3]
now = datetime.datetime.now()
path = "s3://test-787/refined/retail_sales/"+'year='+str(now.year)+'/month='+str(now.month)+'/day='+str(now.day)+'/'
datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": path}, format = "parquet", transformation_ctx = "datasink4")
job.commit()

The lakeformation:GetDataAccess permission is needed for this job to work. I created a new policy named LakeFormationGetDataAccess and attached it to AWSGlueServiceRoleDefault role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "lakeformation:GetDataAccess",
            "Resource": "*"
        }
    ]
}

After running the job manually, it will load new transformed data with partitions in the refined zone as specified in the job.

I created another crawler to get the metadata for these objects residing in refined zone.

After running this crawler manually, now refined data can be queried from Athena.

You could now see the newly added partition columns (year, month, day).

Let us add some new raw data and see how our ETL job process that delta difference.

We only want to process new data and old data is either moved to archive location or deleted from raw zone, whatever is preferred.

Run the ETL job again. See new files being added into refined zone.

Load new partitions using msck repair table query.

Note: Try creating another IAM user and as an administrator in the LakeFormation, give this user limited access to the tables, try querying using Athena. See if the permissions are working.

Pros and cons of LakeFormation

The UI is made simple, all under one roof. Most of the times, one needs to keep multiple tabs open and opening S3 locations is troublesome. This is made easy by register data lake locations feature, one not only can access these locations directly but also revoke/grant permissions of the objects residing there. 

Managing permissions on an Object level in S3 is a hectic process. But with LakeFormation permissions can be managed at the data catalog level. This enables one to grant/revoke permissions to users or roles on a table/column level. These permissions are internally mapped to underlying objects sitting in S3.

Though managing permissions, data ingestion workflow are made easy, but still most of the Glue processes like ETL, Crawler, ML specific transformations have to be setup manually.

This story is authored by Koushik Busim. Koushik is a software engineer and a keen data science and machine learning enthusiast.

Serverless Architecture for Lightening Fast Distributed File Transfer on AWS Data Lake

Today, we are very excited to share our insights on setting up a serverless architecture for setting up a lightening fast way* to copy large number of objects across multiple folders or partitions in an AWS data lake on S3. Typically in a data lake, data is kept across various zones depending on data lifecycle. For example, as the data arrives from source, it can be kept in the raw zone and then post processing moved to a processed zone, so that the lake is ready for the next influx of data. The rate of object transfer is a crucial factor, as it affects the overall efficiency of the data processing lifecycle in the data lake.

*In our tests, we copied more than 300K objects ranging from 1KB to 10GB in size from the raw zone into the processed zone. Compared to the best known tool for hyper fast file transfer on AWS called s3s3mirror, we were able to finish this transfer of about 24GB of data in about 50% less time. More details have been provided at the end of the post.

We created a lambda invoke architecture that copies files/objects concurrently. The below picture accurately depicts it.


OMS (Orchestrator-Master-Slave) Lambda Architecture

For example, If we have an S3 bucket with the following folder structure with the actual objects further contained within this hierarchy of folders, sub-folders and partitions.

S3 file structure

Let us look at how we can use OMS Architecture (Orchestrator-Master-Slave) to achieve hyper-fast distributed/concurrent file transfer. The above architecture can be divided into two halves, Orchestrator-Master, Master-Slave.

Orchestrator-Master

The Orchestrator simply invokes a Master Lambda for each folder. Each Master then iterates the objects in that folder (including all sub-folders and partitions) and invokes a Slave Lambda for each object to copy it to the destination.

Orchestrator-Master Lambda invoke

Let us look at the Orchestrator Lambda code.
Source-to-Destination-File-Transfer-Orchestrator:

import os
import boto3
import json
from datetime import datetime

client_lambda = boto3.client('lambda')
master_lambda = "Source-to-Destination-File-Transfer-Master"

folder_names = ["folder1", "folder2", "folder3", "folder4", "folder5", "folder6", "folder7", "folder8", "folder9"]

def lambda_handler(event, context):
    
    t = datetime.now()
    print("start-time",t)
    
    try:            
        for folder_name in folder_names:
            
            payload_data = {
              'folder_name': folder_name
            }                
        
            payload = json.dumps(payload_data)
            client_lambda.invoke(
                FunctionName = master_lambda,
                InvocationType = 'Event',
                LogType = 'None',
                Payload = payload
            )
            print(payload)
            
    except Exception as e:
        print(e)
        raise e

Master-Slave

Master-Slave Lambda invoke

Let us look at the Master Lambda code.
Source-to-Destination-File-Transfer-Master:

import os
import boto3
import json
from botocore.exceptions import ClientError

s3 = boto3.resource('s3')
client_lambda = boto3.client('lambda')

source_bucket_name = 'source bucket name'
source_bucket = s3.Bucket(source_bucket_name)

slave_lambda = "Source-to-Destination-File-Transfer-Slave"

def lambda_handler(event, context):

    try:
        source_prefix = "" #add if any
        source_prefix = source_prefix + "/" + event['table_name'] + "/"

        for obj in source_bucket.objects.filter(Prefix = source_prefix):
            path = obj.key
            payload_data = {
               'file_path': path
            }
            payload = json.dumps(payload_data)
            client_lambda.invoke(
                FunctionName = slave_lambda,
                InvocationType = 'Event',
                LogType = 'None',
                Payload = payload
            )

    except Exception as e:
        print(e)
        raise e

Slave

Let us look at the Slave Lambda code.
Source-to-Destination-File-Transfer-Slave:

import os
import boto3
import json
import re
from botocore.exceptions import ClientError

s3 = boto3.resource('s3')

source_prefix = "" #add if any
source_bucket_name = "source bucket name"
source_bucket = s3.Bucket(source_bucket_name )

destination_bucket_name = "destination bucket name"
destination_bucket = s3.Bucket(destination_bucket_name )

def lambda_handler(event, context):
    try:
        destination_prefix = "" #add if any
        
        source_obj = { 'Bucket': source_bucket_name, 'Key': event['file_path']}
        file_path = event['file_path']
        
        #copying file
        new_key = file_path.replace(source_prefix, destination_prefix)
        new_obj = source_bucket.Object(new_key)
        new_obj.copy(source_obj)
        
    except Exception as e:
        raise e

You must ensure that these Lambda functions have been configured to meet the maximum execution time and memory limit constraints as per your case. We tested by setting the upper limit of execution time as 5 minutes and 1GB of available memory.

Calculating the Rate of File Transfer

To calculate the rate of file transfer we are printing start time at the beginning of Orchestrator Lambda execution. Once the file transfer is complete, we use another lambda to extract the last modified date attribute of the last copied object.

Extract-Last-Modified:

import json
import boto3
from datetime import datetime
from dateutil import tz

s3 = boto3.resource('s3')

destination_bucket_name = "destination bucket name"
destination_bucket = s3.Bucket(destination_bucket_name)
destination_prefix = "" #add if any

def lambda_handler(event, context):
    
    #initializing with some old date
    last_modified_date = datetime(1940, 7, 4).replace(tzinfo = tz.tzlocal()) 

    for obj in my_bucket.objects.filter(Prefix = destination_prefix):
        
        obj_date = obj.last_modified.replace(tzinfo = tz.tzlocal())
        
        if last_modified_date < obj_date:
            last_modified_date = obj_date
    
    print("end-time: ", last_modified_date)

Now we have both start-time from Orchestrator Lambda and end-time from Extract-last-modified Lambda, their difference is the time taken for file transfer.

Before writing this post, we copied 24.1GB of objects using the above architecture, results are shown in the following screenshots:

duration	=	end-time - start-time
		=	10:04:49 - 10:03:28
		=	00:01:21 (hh-mm-ss)

To check the efficiency of our OMS Architecture, we compared the results of OMS with s3s3mirror, a utility for mirroring content from one S3 bucket to another or to/from the local filesystem. Below screenshot has the file transfer stats of s3s3 for the same set of files:

As we see the difference was 1 minutes and 8 seconds for total data transfer of about 24GB, it can be much higher for large data sets if we add more optimizations. I have only shared a generalized view of the OMS Architecture, it can be further fine-tuned to specific needs and get a highly optimized performance. For instance, if you have partitions in each folder and the OMS Architecture could yield much better results if you invoke Master Lambda for each partition inside the folder instead of invoking the master just at the folder level.

Thanks for the read. Looking forward to your thoughts.

This story is co-authored by Koushik and Subbareddy. Koushik is a software engineer and a keen data science and machine learning enthusiast. Subbareddy is a Big Data Engineer specializing on Cloud Big Data Services and Apache Spark Ecosystem.

Serverless Web Application Architecture using React with Amplify: Part2

In Part1, we have learned and built the AWS services (API Gateway, Lambda, DynamoDB) specific components of the serverless architecture as shown below. We shall build the rest of the serverless architecture in this article. To sum it up, we will build Jotter (a note taking app) in React and use Amplify to talk to these AWS services. Finally we shall deploy the React app on S3 cloud storage.

Serverless Web Application Architecture using React with Amplify

After deployment:

Serverless React Web Application Hosted on S3

Building the rest of the architecture

  1. Setting up a Cognito User Pool and Identity pool
  2. Building React application and configure Amplify.
  3. Create Sign-in, Sign-up using Cognito with Amplify.
  4. API Gateway integration.
  5. Secure the API Gateway with Cognito and test it.
  6. Deployment on S3 cloud storage.

Setting up a Cognito User Pool and Identity pool

Go to AWS console > Cognito > Create user pool.
Give the pool a name, and click on Review defaults.

This will show all the default settings for a cognito user pool, let’s go to each section on the left nav bar and modify accordingly such that it looks like below and create the cognito user pool.

Creating an App client. While creating an app client do not enable generate client secret.

After all modifications go to review and create pool. Now note down Pool Id, Pool ARN and App client Id

More details on setting up an authentication service using Cognito can be found here.

Let us build an Identity pool, which we could use to access AWS services (API Gateway in our case). But before that let us understand the difference between user pool and identity pool. Below image accurately depicts how both the services are different and how they complement each other.

Now we need to specify what AWS resources are accessible for users with temporary credentials obtained from the Cognito Identity Pool. In our case we want to invoke API gateway. Below policy exactly does that.

{
 "Version": "2012-10-17",
 "Statement": [
    {
     "Effect": "Allow",
     "Action": [
       "mobileanalytics:PutEvents",
       "cognito-sync:*",
       "cognito-identity:*"
      ],
     "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "execute-api:Invoke"
      ],
      "Resource": [
        "arn:aws:executeapi:YOUR_API_GATEWAY_REGION:*:YOUR_API_GATEWAY_ID/*/*/*"
      ]
    }
  ]
}

Add your API Gateway region and gateway id in the resource ARN.

Copy paste the above policy, and click Allow. Click on edit identity pool and note down the Identity pool ID.

Building React application and configuring Amplify

Create a new react project: 

create-react-app jotter

Now go to project directory and run the app.

cd jotter
npm start

You should see your app running on local host port 3000. To start editing, open project directory in your favourite code editor.

Open up public/index.html and edit the title tag to:

<title>Jotter - A note taking app</title>

Installing react-bootstrap:
We shall be using bootstrap for styling

npm install react-bootstrap --save

We also need bootstrap CDN for the above library to work. Open up public/index.html file and add the below CDN link above title tag.

<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T"
crossorigin="anonymous" />

Handle Routes with React Router:

We are building a single page app, we are going to use React Router to handle routes/navigation.

Go to src/index.js replace 

ReactDOM.render(<App />, document.getElementById('root'));

with

ReactDOM.render(<Router><App /></Router>, document.getElementById('root'));

Let us create a routes component.
routes.js:

import React from "react";
import CreateJot from './CreateJot';
import AllJotsList from './AllJotsList';
import Login from './Login';
import Signup from './Signup';
import NotFound from './NotFound';
import Viewjot from './Viewjot';
import { Route, Switch } from "react-router-dom";

export default () => 
<Switch>
   <Route exact path="/" component={Login}/>
   <Route exact path="/newjot" component={CreateJot} />
   <Route exact path="/alljots" component={AllJotsList} />
   <Route exact path="/login" component={Login} />
   <Route exact path="/signup" component={Signup} />
   <Route exact path="/view" component={Viewjot}/>
   <Route component={NotFound} />
</Switch>;

If you observed we used a switch component here, which is more like an if else ladder statement, first match is rendered. If no path matches, then the last NotFound component is rendered. For now all the above components simply return <h1> tag with the name of the component. Create all the components like the following.

Signup.js

import React from 'react';

export default class Signup extends React.Component {
  constructor(props) {
    super(props);
    this.state = {}
  }
  render(){
    return (
       <h1>Signup</h1>
    );
  }
}

Lets us setup navigation and test our routes.
App.js

import React, {Fragment} from 'react';
import { Link, NavLink } from "react-router-dom";
import Routes from './components/Routes';
import { withRouter } from 'react-router';

class App extends React.Component {
    constructor(props) {
        super(props);
        this.state = {
            isAuthenticated: false
        }
    }
    render(){
        return (
            <Fragment>
                <div className="navbar navbar-expand-lg navbar-light bg-light">
                    <Link to="/" className="navbar-brand" href="#"><h1>Jotter.io</h1></Link>
                    <div className="collapse navbar-collapse" id="navbarNav">
                        <ul className="navbar-nav">
                            {this.state.isAuthenticated ? 
                                <Fragment>
                                    <li className="nav-item">
                                        <NavLink onClick={this.handleLogout}>Logout</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/newjot" className="nav-link">New Jot</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/alljots" className="nav-link">All Jots</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/viewjot" className="nav-link">View Jot</NavLink>
                                    </li>
                                </Fragment> : 
                                <Fragment>
                                    <li className="nav-item">
                                        <NavLink to="/login" className="nav-link">Login</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/signup" className="nav-link">Signup</NavLink>
                                    </li>
                                </Fragment>
                            } 
                        </ul>
                    </div>                   
                </div>
                <Routes/>
            </Fragment>
        );
    }
}

export default withRouter(App);

What we have done is initially set the isAuthenticated state to false. And conditionally render the navigation bar to show only those routes which are accessible when the user isn’t logged in and vice versa.

Let us test our routes now.

Now our routes are working fine. Though we conditionally rendered the navigation bar, all routes are still accessible via URL.

We shall work on securing our routes, once we configure Amplify. Let us build the Login and Signup components and then configure Amplify to secure our routes.
Login.js

import React, { Component } from "react";
import { FormGroup, FormControl, FormLabel, Button } from "react-bootstrap";

export default class Login extends Component {
    constructor(props) {
        super(props);
        this.state = {
            email: "",
            password: ""
        };
    }

    validateForm() {
        return this.state.email.length > 0 && this.state.password.length>0;
    }

    handleChange = event => {
        this.setState({
                [event.target.id]: event.target.value
            });
    }

    handleSubmit = event => {
        event.preventDefault();
        console.log("Submitted",this.state.email,this.state.password);
    }

    render() {
        return (
            <div className="Home">
                <div className="col-md-4"> 
                    <form >
                        <FormGroup controlId="email">
                            <FormLabel>Email</FormLabel>
                            <FormControl
                                autoFocus
                                type="email"
                                value={this.state.email}
                                onChange={this.handleChange}
                            />
                        </FormGroup>
                        <FormGroup controlId="password" >
                            <FormLabel>Password</FormLabel>
                            <FormControl
                                value={this.state.password}
                                onChange={this.handleChange}
                                type="password"
                            />
                        </FormGroup>
                        <Button onClick={this.handleSubmit}>
                        Login
                        </Button>
                            
                    </form>
                </div>
            </div>
        );
    }
}

Simple login page that logs user credentials on submit.

Signup.js

import React, { Component } from "react";
import { FormText, FormGroup, FormControl, FormLabel } from "react-bootstrap";
import React, { Component } from "react";
import { FormText, FormGroup, FormControl, FormLabel, Button } from "react-bootstrap";

export default class sample extends Component {
   constructor(props) {
   super(props);
   this.state = {
       email: "",
       password: "",
       confirmPassword: "",
       confirmationCode: "",
       newUser: null
       };
   }
   validateForm() {
       return (
           this.state.email.length > 0 && this.state.password.length > 0 && this.state.password === this.state.confirmPassword);
   }
      
   validateConfirmationForm() {
       return this.state.confirmationCode.length > 0;
   }
      
   handleChange = event => {
       this.setState({
           [event.target.id]: event.target.value
       });
   }

   handleSubmit = async event => {
       event.preventDefault();
   }
  
   handleConfirmationSubmit = async event => {
       event.preventDefault();
   }

   render() {
       return (
           <div className="Signup">
               {this.state.newUser === null ? this.renderForm() : this.renderConfirmationForm()}
           </div>
       );
   }

   renderConfirmationForm() {
       return (
           <div className="Home">
               <div className="col-md-4">
                   <form onSubmit={this.handleConfirmationSubmit}>
                       <FormGroup controlId="confirmationCode" >
                           <FormLabel>Confirmation Code</FormLabel>
                           <FormControl
                               autoFocus
                               type="tel"
                               value={this.state.confirmationCode}
                               onChange={this.handleChange}
                           />
                           <FormText>Please check your email for the code.</FormText>
                       </FormGroup>
                       <Button type="submit">
                           Verify
                       </Button>   
                   </form>
               </div>
           </div>
       );
   }

   renderForm() {
       return (
           <div className="Home">
               <div className="col-md-4">
                   <form onSubmit={this.handleSubmit}>
                       <FormGroup controlId="email" >
                           <FormLabel>Email</FormLabel>
                           <FormControl   
                               autoFocus
                               type="email"
                               value={this.state.email}
                               onChange={this.handleChange}
                           />
                       </FormGroup>
                       <FormGroup controlId="password" >
                           <FormLabel>Password</FormLabel>
                           <FormControl
                               value={this.state.password}
                               onChange={this.handleChange}
                               type="password"
                           />
                       </FormGroup>
                       <FormGroup controlId="confirmPassword" >
                           <FormLabel>Confirm Password</FormLabel>
                           <FormControl
                               value={this.state.confirmPassword}
                               onChange={this.handleChange}
                               type="password"
                           />
                       </FormGroup>
                       <Button type="submit">
                           Signup
                       </Button>   
                   </form>
               </div>
           </div>
       );
   }
}

Configuring Amplify:

Install aws-amplify.

npm install --save aws-amplify

Create a config file with all the required details to connect to aws services we created earlier in part1.

config.js

export default {
   apiGateway: {
       REGION: "xx-xxxx-x",
       URL: " https://xxxxxxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/"
   },
   cognito: {
       REGION: "xx-xxxx-x",
       USER_POOL_ID: "xx-xxxx-x_yyyyyxxxx",
       APP_CLIENT_ID: "xxxyyyxxyxyxyxxyxyxxxyxxyx",
       IDENTITY_POOL_ID: "xx-xxxx-x:aaaabbbb-aaaa-cccc-dddd-aaaabbbbcccc"
   }
  };

Import the config.js file and aws-amplify.
Open up index.js and add the configuration like this.

import React from 'react';
import ReactDOM from 'react-dom';
import './index.css';
import App from './App';
import * as serviceWorker from './serviceWorker';
import { BrowserRouter as Router} from "react-router-dom";
import config from "./config";
import Amplify from "aws-amplify";

Amplify.configure({
   Auth: {
       mandatorySignIn: true,
       region: config.cognito.REGION,
       userPoolId: config.cognito.USER_POOL_ID,
       identityPoolId: config.cognito.IDENTITY_POOL_ID,
       userPoolWebClientId: config.cognito.APP_CLIENT_ID
   },
   API: {
       endpoints: [
           {
               name: "jotter",
               endpoint: config.apiGateway.URL,
               region: config.apiGateway.REGION
           },
       ]
   }
});

ReactDOM.render(<Router><App /></Router>, document.getElementById('root'));

serviceWorker.unregister();

Now we have successfully configured amplify, we could connect to and use aws services with ease. If you would like to use any other aws services, you know where to configure them.

App.js

import React, {Fragment} from 'react';
import { Link, NavLink } from "react-router-dom";
import Routes from './components/Routes';
import { withRouter } from 'react-router';
import { Auth } from "aws-amplify";

class App extends React.Component {
    constructor(props) {
        super(props);
        this.state = {
            isAuthenticated: false
        }
    }

    userHasAuthenticated = (value) => {
        this.setState({ isAuthenticated: value });
    }

    handleLogout = async event => {
        await Auth.signOut();
        this.userHasAuthenticated(false);
        this.props.history.push("/login");
    }

    async componentDidMount() {
        try {
          await Auth.currentSession();
          this.userHasAuthenticated(true);
          this.props.history.push("/alljots");
        }
        catch(e) {
          if (e !== 'No current user') {
            alert(e);
          }
        }
    }

    render(){
        return (
            <Fragment>
                <div className="navbar navbar-expand-lg navbar-light bg-light">
                    <Link to="/" className="navbar-brand" href="#"><h1>Jotter.io</h1></Link>
                    <div className="collapse navbar-collapse" id="navbarNav">
                        <ul className="navbar-nav">
                            {this.state.isAuthenticated ? 
                                <Fragment>
                                    <li className="nav-item">
                                        <NavLink onClick={this.handleLogout}>Logout</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/newjot" className="nav-link">New Jot</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/alljots" className="nav-link">All Jots</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/viewjot" className="nav-link">View Jot</NavLink>
                                    </li>
                                </Fragment> : 
                                <Fragment>
                                    <li className="nav-item">
                                        <NavLink to="/login" className="nav-link">Login</NavLink>
                                    </li>
                                    <li className="nav-item">
                                        <NavLink to="/signup" className="nav-link">Signup</NavLink>
                                    </li>
                                </Fragment>
                            } 
                        </ul>
                    </div>                   
                </div>
                <Routes userHasAuthenticated={this.userHasAuthenticated} isAuthenticated = {this.state.isAuthenticated}/>
            </Fragment>
        );
    }
}

export default withRouter(App);

We did 3 things here:

  1. Checking if there is any session after the component is mounted, so the isAuthenticated state is set to true. 
  2. Handling the logout click by clearing the session using Auth.signOut method. 
  3. Passing the isAuthenticated state and userHasAuthenticated function as props to all the routes. By doing so Login and Signup component can update the isAuthenticated state and all the other components receive this state as props. 

Routes.js

import React from "react";
import CreateJot from './CreateJot';
import AllJotsList from './AllJotsList';
import Login from './Login';
import Signup from './Signup';
import NotFound from './NotFound';
import Viewjot from './Viewjot';
import { Route, Switch } from "react-router-dom";

export default ( {childProps }) =>
<Switch>
   <Route path="/" exact component={Login} props={childProps}/>
   <Route exact path="/newjot" component={CreateJot} props={childProps}/>
   <Route exact path="/alljots" component={AllJotsList} props={childProps}/>
   <Route path="/login" exact component={Login} props={childProps}/>
   <Route path="/signup" exact component={Signup} props={childProps} />
   <Route path="/view" exact component={Viewjot}></Route>
   <Route component={NotFound} />
</Switch>;

Signup.js

import React, { Component } from "react";
import { FormText, FormGroup, FormControl, FormLabel, Button} from "react-bootstrap";
import { Auth } from "aws-amplify";

export default class Signup extends Component {
    constructor(props) {
    super(props);
    this.state = {
        email: "",
        password: "",
        confirmPassword: "",
        confirmationCode: "",
        newUser: null
        };
    }
    validateForm() {
        return (
            this.state.email.length > 0 && this.state.password.length > 0 && this.state.password === this.state.confirmPassword);
    }
        
    validateConfirmationForm() {
        return this.state.confirmationCode.length > 0;
    }
        
    handleChange = event => {
        this.setState({
            [event.target.id]: event.target.value
        });
    }

    handleSubmit = async event => {
        event.preventDefault();
        try {
            const newUser = await Auth.signUp({
                username: this.state.email,
                password: this.state.password
            });
            this.setState({newUser});
        } catch (e) {
            alert(e.message);
        }
    }
    
    handleConfirmationSubmit = async event => {
        event.preventDefault();
        try {
            await Auth.confirmSignUp(this.state.email, this.state.confirmationCode);
                await Auth.signIn(this.state.email, this.state.password);
                this.props.userHasAuthenticated(true);
                this.props.history.push("/");
        } catch (e) {
            alert(e.message);
        }
    }

    render() {
        return (
            <div className="Signup">
                {this.state.newUser === null ? this.renderForm() : this.renderConfirmationForm()}
            </div>
        );
    }

    renderConfirmationForm() {
        return (
            <div className="Home">
                <div className="col-md-4">
                    <form onSubmit={this.handleConfirmationSubmit}>
                        <FormGroup controlId="confirmationCode" >
                            <FormLabel>Confirmation Code</FormLabel>
                            <FormControl 
                                autoFocus
                                type="tel"
                                value={this.state.confirmationCode}
                                onChange={this.handleChange}
                            />
                            <FormText>Please check your email for the code.</FormText>
                        </FormGroup>
                        <Button type="submit">
                            Verify
                        </Button>
                    </form>
                </div>
            </div>
        );
    }

    renderForm() {
        return (
            <div className="Home">
                <div className="col-md-4">
                    <form onSubmit={this.handleSubmit}>
                        <FormGroup controlId="email" >
                            <FormLabel>Email</FormLabel>
                            <FormControl    
                                autoFocus
                                type="email"
                                value={this.state.email}
                                onChange={this.handleChange}
                            />
                        </FormGroup>
                        <FormGroup controlId="password" >
                            <FormLabel>Password</FormLabel>
                            <FormControl
                                value={this.state.password}
                                onChange={this.handleChange}
                                type="password"
                            />
                        </FormGroup>
                        <FormGroup controlId="confirmPassword" >
                            <FormLabel>Confirm Password</FormLabel>
                            <FormControl
                                value={this.state.confirmPassword}
                                onChange={this.handleChange}
                                type="password"
                            />
                        </FormGroup>
                        <Button type="submit">
                            Signup
                        </Button>
                    </form>
                </div>
            </div>
        );
    }
}

Let us test it.

Once user is successfully created, you shall see user appear in the User Pool. 

You will be redirected to Login page. As we haven’t secured our routes and you should be seeing this.

Let’s create those routes now.
Routes.js

import React from "react";
import CreateJot from './CreateJot';
import AllJotsList from './AllJotsList';
import Login from './Login';
import Signup from './Signup';
import NotFound from './NotFound';
import Viewjot from './Viewjot';
import { Route, Switch } from "react-router-dom";
import AuthorizedRoute from "./AuthorizedRoute";
import UnAuthorizedRoute from "./UnAuthorizedRoute";

export default (cprops) =>
<Switch>

  <UnAuthorizedRoute path="/login" exact component={Login} cprops={cprops}/>
  <UnAuthorizedRoute path="/signup" exact component={Signup} cprops={cprops}/>
 
  <AuthorizedRoute exact path="/newjot" component={CreateJot} cprops={cprops}/>
  <AuthorizedRoute exact path="/alljots" component={AllJotsList} cprops={cprops}/>
  <AuthorizedRoute exact path="/viewjot" component={Viewjot} cprops={cprops}/>
  <AuthorizedRoute exact path="/" component={AllJotsList} cprops={cprops}/>
 
  <Route component={NotFound} />
</Switch>;

AuthorizedRoute.js

import React from 'react'
import { Redirect, Route } from 'react-router-dom'

const AuthorizedRoute = ({ component: Component, cprops, ...rest }) => {
   // Add your own authentication on the below line.
 return (
   <Route {...rest} render={props =>
       cprops.isAuthenticated ? <Component {...props} {...cprops} /> : <Redirect to="/login" />}
   />
 )
}

export default AuthorizedRoute

What we did here is if logged in then we simply redirect to the component or else to login.

UnAuthorizedRoute.js

import React from 'react'
import { Redirect, Route } from 'react-router-dom'

const UnAuthorizedRoute = ({ component: Component, cprops, ...rest }) => {
   // Add your own authentication on the below line.
 return (
   <Route {...rest} render={props =>
       !cprops.isAuthenticated ? <Component {...props} {...cprops} /> : <Redirect to="/" />}
   />
 )
}

export default UnAuthorizedRoute

What we did here is if not logged in then we simply redirect to login or else to the component. Let us add the login functionality to our login page using Auth.signIn method.
Login.js

import React, { Component } from "react";
import { FormGroup, FormControl, FormLabel, Button } from "react-bootstrap";
import { Auth } from "aws-amplify";

export default class Login extends Component {
   constructor(props) {
       super(props);
       this.state = {
           email: "",
           password: ""
       };
   }

   validateForm() {
       return this.state.email.length > 0 && this.state.password.length>0;
   }

   handleChange = event => {
       this.setState({
               [event.target.id]: event.target.value
           });
   }

   handleSubmit = async event => {
       event.preventDefault();
       try {
           await Auth.signIn(this.state.email, this.state.password);
           this.props.userHasAuthenticated(true);
           this.props.history.push("/alljots");
       } catch (e) {
           alert(e.message);
       }
      
   }

   render() {
       return (
           <div className="Home">
               <div className="col-md-4">
                   <form onSubmit={this.handleSubmit}>
                       <FormGroup controlId="email">
                           <FormLabel>Email</FormLabel>
                           <FormControl
                               autoFocus
                               type="email"
                               value={this.state.email}
                               onChange={this.handleChange}
                           />
                       </FormGroup>
                       <FormGroup controlId="password" >
                           <FormLabel>Password</FormLabel>
                           <FormControl
                               value={this.state.password}
                               onChange={this.handleChange}
                               type="password"
                           />
                       </FormGroup>
                       <Button type="submit">
                           Login
                       </Button>
                   </form>
               </div>
           </div>
       );
   }
}

Securing API with Cognito:

Earlier in part1, we have created an API called jotter, now go to Authorizers section and create new Authorizer.

Once created, let us go to each resource and add authorization.

Do this for all the resources in the API.

API Gateway integration:

AllJotsList.js

import React from 'react';
import {Button} from 'react-bootstrap';
import {API, Auth} from 'aws-amplify';
import { Link } from "react-router-dom";

export default class AllJotsList extends React.Component{
   constructor(props) {
       super(props);
       this.state = {
           jotlist: []
       }
   };
   handleDelete = async (index) => {
       let djot = this.state.jotlist[index];
       let sessionObject = await Auth.currentSession();
       let idToken = sessionObject.idToken.jwtToken;
       let userid = sessionObject.idToken.payload.sub;
       try {
           const status = await this.DeleteNote(djot.jotid,userid,idToken);
           const jotlist = await this.jots(userid,idToken);
           this.setState({ jotlist });
       } catch (e) {
           alert(e);
       }
   }

   DeleteNote(jotid,userid,idToken) {
       let path = "jot?jotid="+jotid+"&userid="+userid;
       let myInit = {
           headers: { Authorization: idToken }
       }
       return API.del("jotter", path,myInit);
   }

   handleEdit = async (index) => {
       let jotlist = this.state.jotlist;
       let sessionObject = await Auth.currentSession();
       let idToken = sessionObject.idToken.jwtToken;
       let userid = sessionObject.idToken.payload.sub;
       let path = "/view?jotid="+jotlist[index].jotid+"&userid="+userid;
       this.props.history.push(path);
   }

   async componentDidMount() {
       if (!this.props.isAuthenticated) {
           return;
       }
       try {
           let sessionObject = await Auth.currentSession();
           let idToken = sessionObject.idToken.jwtToken;
           let userid = sessionObject.idToken.payload.sub;
           const jotlist = await this.jots(userid,idToken);
           this.setState({ jotlist });
       } catch (e) {
           alert(e);
       }
   }

   jots(userid,idToken) {
       let path = "/alljots?userid="+userid;
       let myInit = {
           headers: { Authorization: idToken }
       }
       return API.get("jotter", path, myInit);
   }

   render(){
       if(this.state.jotlist == 0)
           return(<h2>Please <Link to="/newjot">add jots</Link> to view.</h2>)
       else
           return(
               <div className="col-md-6">
               <table style={{"marginTop":"2%"}} className="table table-hover">
                   <thead>
                       <tr>
                           <th scope="col">#</th>
                           {/* <th scope="col">Last modified</th> */}
                           <th scope="col">Title</th>
                           <th scope="col">Options</th>
                       </tr>
                   </thead>
                   <tbody>
                           {
                           this.state.jotlist.map((jot, index) => {
                           return(
                               <tr key={index}>
                                       <td>{index+1}</td>
                                       {/* <td>{jot.lastmodified}</td> */}
                                       <td>{jot.title}</td>
                                       <td>
                                               <Button variant="outline-info" size="sm"
                                                   type="button"
                                                   onClick={()=>this.handleEdit(index)}
                                                   >Edit
                                               </Button>
                                           {" | "}
                                           <Button variant="outline-danger" size="sm"
                                               type="button"
                                               onClick={()=>this.handleDelete(index)}
                                               >Delete
                                           </Button>
                                       </td>
                               </tr>
                           )})
                           }}
                       </tbody>

               </table>
               </div>   
           );
   }
}

What we are doing here:

  1. An async call to our alljots api to get list of jots available for a given user, using API.get method.
  2. Since we have secured our API gateway with cognito, we need to send the id token in the Authorization header. Once we logged in Amplify automatically sets the session. We simply have to call the Auth.currentSession() method to get the session object. This session object contains all the data including tokens we need.
  3. Once we receive a successful response, if the list is empty we display a message “Please add jots to view.” Otherwise we iterate the list and display them as table.
  4. Each record in the table has an edit, and delete options. Edit option redirects to Viewjot component while delete is handled here itself using API.del method. After a successful response, 1-3 steps are repeated.

In the process of testing, I have deleted and created new jots. Now, you should be seeing a list of all jots of a logged in user.

For Viewjot and CreateJot we are using CKEditor.
Viewjot.js

import React, {Fragment} from 'react';
import {Button} from 'react-bootstrap';
import CKEditor from '@ckeditor/ckeditor5-react';
import BalloonEditor from '@ckeditor/ckeditor5-build-balloon';
import { API, Auth } from "aws-amplify";
import queryString from "query-string";


export default class ViewJot extends React.Component {
   constructor(props, match) {
       super(props,match);
       this.state = {
           titledata: "",
           bodydata: "",
           userid: "",
           jotid: "",
           edit: false
       }
   };
   onTitleChange = (event, editor) => {
       const data = editor.getData();
       this.setState({
           titledata: data
       });
   }
   onBodyChange = (event, editor) => {
       const data = editor.getData();
       this.setState({
           bodydata: data
       });
   }
   validateJot = () =>  {
       return this.state.titledata.length > 0 && this.state.bodydata > 0;
   }
   editJot = () => {
       this.setState({edit:true})
   }
   saveJot = async event => {
       event.preventDefault();
       let note = {
           title: this.state.titledata,
           body: this.state.bodydata,
           userid: this.state.userid,
           jotid: this.state.jotid
       };
       try {
           let sessionObject = await Auth.currentSession();
           let idToken = sessionObject.idToken.jwtToken;
           await this.saveNote(note,idToken);
           this.props.history.push("/alljots");
       }catch (e) {
           alert(e);
       }
   }
   saveNote(note, idToken) {
       let myInit = {
           headers: { Authorization: idToken},
           body: note
       }
       return API.post("jotter", "/newjot", myInit)
   }

   async componentDidMount(){
       try{
           let sessionObject = await Auth.currentSession();
           let idToken = sessionObject.idToken.jwtToken;
           let jot = await this.getNote(idToken);
           this.setState({
               titledata: jot.title,
               bodydata: jot.body,
               userid: jot.userid,
               jotid: jot.jotid
           });
       }catch(e){
           alert(e);
       }
   }

   getNote(idToken) {
       let params = queryString.parse(this.props.location.search);
       let path = "jot?jotid="+params.jotid+"&userid="+params.userid;
       let myInit = {
           headers: { Authorization: idToken }
       }
       return API.get("jotter", path,myInit);
   }
   render() {
       return (
           <Fragment>
               {!this.state.edit &&
               <div style={{marginTop: "1%", marginLeft: "5%"}}>
                   <Button
                       onClick={this.editJot}
                   >
                       Edit
                   </Button>
               </div>}
               {this.state.edit &&
               <div style={{marginTop: "1%", marginLeft: "5%"}}>
                   <Button
                       onClick={this.saveJot}
                   >
                       Save
                   </Button>
               </div>}
               <div style={{marginTop: "1%", marginLeft: "5%", width: "60%",border: "1px solid #cccc", marginBottom: "1%"}}>
                   <CKEditor
                       data={this.state.titledata}
                       editor={BalloonEditor}
                       disabled={!this.state.edit}
                       onChange={(event, editor) => {
                           this.onTitleChange(event, editor);
                   }}/>
               </div>
               <div style={{marginLeft: "5%", width: "60%",border: "1px solid #cccc"}}>
                   <CKEditor
                       data={this.state.bodydata}
                       editor={BalloonEditor}
                       disabled={!this.state.edit}
                       onChange={(event, editor) => {
                           this.onBodyChange(event, editor);
                   }}/>
               </div>          
           </Fragment>
       );
   }
}

What we are doing here is:

  1. As we redirect from all jots table on edit, we send the jot id as path parameter and use this to get data from API using API.get method.
  2. We have 2 modes built using state. We then use CKEditor in disable mode to only view the jot.
  3. Edit mode to edit jot data and a post request to update the data using API.post method.
  4. Same thing with CreateJot but only edit mode.

CreateJot.js

import React, {Fragment} from 'react';
import {Button} from 'react-bootstrap';
import CKEditor from '@ckeditor/ckeditor5-react';
import BalloonEditor from '@ckeditor/ckeditor5-build-balloon';
import { API,Auth } from "aws-amplify";
const titlecontent = "Title..";
const bodycontent = "Start writing notes..";

export default class CreateJot extends React.Component {
   constructor(props) {
       super(props);
       this.state = {
           titledata: titlecontent,
           bodydata: bodycontent,
       }
   };
   onTitleChange = (event, editor) => {
       const data = editor.getData();
       this.setState({
           titledata: data
       });
   }
   onBodyChange = (event, editor) => {
       const data = editor.getData();
       this.setState({
           bodydata: data
       });
   }
   validateJot = () =>  {
       return this.state.titledata.length > 0 && this.state.bodydata > 0;
   }
   createNew = async event => {
       event.preventDefault();
       let sessionObject = await Auth.currentSession();
       let idToken = sessionObject.idToken.jwtToken;
       let userid = sessionObject.idToken.payload.sub;
       let note = {
           title: this.state.titledata,
           body: this.state.bodydata,
           userid: userid,
           jotid: this.createJotid()
       };
       try {
           await this.createNote(note,idToken);
           this.props.history.push("/");
       }catch (e) {
           alert(e);
       }
   }
   createNote(note,idToken) {
       let myInit = {
           headers: { Authorization: idToken },
           body: note
       }
       return API.post("jotter", "/newjot", myInit);
   }
   createJotid(){
       let dt = new Date().getTime();
       let uuid = 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {
           let r = (dt + Math.random()*16)%16 | 0;
           dt = Math.floor(dt/16);
           return (c=='x' ? r :(r&0x3|0x8)).toString(16);
       });
       return uuid;
   }
   render() {
       return (
           <Fragment>
               <div style={{marginTop: "1%", marginLeft: "5%"}}>
                   <Button
                       onClick={this.createNew}
                   >
                       Create
                   </Button>
               </div>
               <div style={{marginTop: "1%", marginLeft: "5%", width: "60%",border: "1px solid #cccc", marginBottom: "1%"}}>
                   <CKEditor
                       data={titlecontent}
                       editor={BalloonEditor}
                       onChange={(event, editor) => {
                           this.onTitleChange(event, editor);
                   }}/>
               </div>
               <div style={{marginLeft: "5%", width: "60%",border: "1px solid #cccc"}}>
                   <CKEditor
                       data={bodycontent}
                       editor={BalloonEditor}
                       onChange={(event, editor) => {
                           this.onBodyChange(event, editor);
                   }}/>
               </div>          
           </Fragment>
       );
   }
}

Deployment on S3 cloud storage

Manual Deployment:
Create a public S3 bucket. Then build react application using the command:

npm run build

This creates a folder named ‘build’ in the project directory with all the static files. Upload these files into the S3 bucket you created earlier.

After the upload is complete. Goto S3 bucket properties and enable Static website hosting. This will give you a URL endpoint to access the static files from browser. Now your web app is hosted on S3, and can be visited using the URL endpoint. You could also map a custom domain of yours to this endpoint. This way when users visit this custom domain, internally they are redirected to this endpoint.

Open Chrome or Firefox, and visit the endpoint. You should see your app running. Now your application is open to the world.

Auto Deployment:
What about code updates? Should i keep uploading static files to S3 manually again and again? No, you don’t have to. You could either build a sophisticated CI/CD pipeline which uses webhooks like Jenkins that pull code from Git events or automate the manual build and deploy to S3 process using react scripts with aws cli.

Thanks for the read, I hope it was helpful.

This story is authored by Koushik Busim. Koushik is a software engineer and a keen data science and machine learning enthusiast.

Serverless Web Application Architecture using React with Amplify: Part1

In this series we shall be building Jotter (a note taking app). Jotter is a serverless web application built with React and Amplify using AWS cloud services. The illustration below depicts the serverless architecture we are going to build.

Serverless Web Application Architecture using React with Amplify

After deployment:

Serverless React Web Application Hosted on S3

Understanding Serverless:

Just put your code on cloud and run it. Without worrying about infrastructure.

Servers are still there in serverless, but managed by service provider. In our case, the service provider is AWS. Let us learn a little about AWS & its services. Feel free to skip, if you are acquainted with AWS know-how.

Understanding AWS:

Amazon Web Services is a cloud services platform that offers various cloud services. You could build reliable, scalable applications without worrying about managing infrastructure. Amazon Web Services offers a broad set of global cloud-based products including compute, storage, databases, analytics, networking, mobile, developer tools, management tools, IoT, security and enterprise applications. These services help organizations move faster, lower IT costs, and scale.

In a nutshell, AWS is like playing Lego or Minecraft. In lego, you use blocks to create different structures right? We neither create nor maintain those blocks. All we do is join & disjoin blocks logically to construct structures. Similarly, in AWS each service is like a block, we compose these services to develop our serverless application. You could also build a server based web application using AWS EC2 instance configured with AWS Route53, but that’s not the scope of this series.

Advantages:

  1. Granular control over each block. Testing becomes easy, as if a block is malfunctioning, we only have to replace/fix that one block.
  2. Reusability: Certain individual blocks or groups of blocks can be reused for other structures.
  3. We only pay for the usage.
  4. Scalable, high availability and fault tolerance.

That’s broadly about AWS as a platform. We shall first understand these services and then build the serverless web application architecture we viewed earlier.

Services used in the architecture:

  1. DynamoDB (NoSQL Database)
  2. Lambda (Server side computing)
  3. API Gateway (HTTP calls interface)
  4. Cognito (Authentication)
  5. S3 (Cloud storage)
  6. Amplify and React (Client side)

Understanding DynamoDB:

DynamoDB is a distributed NoSQL, schemaless, key-value storage system. Extremely scalable as the amount of data stored mainly depends on the physical memory of the system. While in DynamoDB you don’t have any such limits as you can scale the system horizontally. You will pay only for the resources you provision.

Though it is schemaless it is still represented as a table. Each table is a collection of items. Value(Attributes) of each item can be a scalar, JSON, set. Item size should be less than 400KB (binary, UTF-8). Each item in the table is uniquely identified with a Primary key and is mandatory while creating the table. Primary key can be the same as Partition key or a combination of Partition key and Sort key. If it is a combination of both it is also called Composite primary key.

Note: Primary key cannot be modified once created. Partition key and Sort key values are internally used as input for a hash function to determine storage.

More on indexes and DynamoDB, Parallel scan here.

Understanding Lambda:

Lambda is a serverless compute service. It lets you run code without provisioning or managing servers. You pay only for the compute time you consume. It supports NodeJS, Python, Java, GO etc. Lambda can be triggered from a variety of AWS services. Learn more about Lambda here.

To Every Lambda function handler, 3 objects can passed as argument. 

  1. The first argument is the event object, which contains information from the invoker. The invoker passes this information as a JSON-formatted string when it invokes Lambda. When an AWS service invokes your function, the event object structure varies by service.
  2. The second argument is the context object, which contains information about the invocation, function, and execution environment. In the preceding example, the function gets the name of the log stream from the context object and returns it to the invoker.
  3. The third argument, callback, is a function that you can call in non-async functions to send a response to invoker. The callback function takes two arguments: an Error and a response. The response object must be compatible with JSON.stringify. Error should be null for successful response.

In our app, Lambda is used as a mediator for incoming HTTP requests & DynamoDB. Lambda writes, reads and processes data to/from DynamoDB accordingly.

Understanding API Gateway:

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. 

With a few clicks in the AWS Management Console, you can create REST and WebSocket APIs that act as a front door for applications to access data, business logic, or functionality from your backend services, such as workloads running on EC2, code running on Lambda, any web application, or real-time communication applications.

API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.

In our app, we use API Gateway to invoke different Lambda functions for different API calls.

Understanding Cognito:

Amazon Cognito User Pool makes it easy for developers to add sign-up and sign-in functionality to web and mobile applications. It serves as your own identity provider to maintain a user directory. It supports user registration and sign-in, as well as provisioning identity tokens for signed-in users.

Our Jotter app needs to handle user accounts and authentication in a secure and reliable way. We are going to use Cognito User Pool for it.

Understanding Amplify:

AWS Amplify is a framework provided by AWS to develop applications, with AWS cloud services. Amplify makes the process of stitching cloud services with our application hassle free. Amplify provides different libraries for different apps(iOS, Android, Web, React Native). Amplify javascript library is available as an npm package(aws-amplify). The aws-amplify client library uses a config file to connect AWS services. The services which amplify provides include Database, API, Lambda/serverless, Authentication, Hosting, Storage, Analytics. 

Note: One could also use AWS Amplify CLI to provision AWS services. The aws-amplify client library and Amplify CLI are two different things.

Amplify CLI internally uses cloudformation to provision/create, while aws-amplify client library is used to connect to AWS services. Using Amplify CLI is inconvenient as you are not creating services directly from AWS console but by using a CloudFormation stack internally. If successful, it returns a config file with all the metadata of different services provisioned. Instead a simple approach is to create required services from AWS console and update the config file manually and use it with aws-amplify client library.

In our Jotter app, we will use aws-amplify client javascript library to interact with AWS services.

Building the serverless architecture

API Gateway configured with Lamba to read and write data from DynamoDB.

Let us first build this setup and then add the remaining.

Working with DynamoDB:

Create table:

Go to AWS console > DynamoDB > Tables, choose Create table. As we learned earlier each table is a collection of items and each item is identified with a primary key.

So, give the table a name, set the schema for primary key. Each jot is uniquely identified with the combination of userid (partition key) and jotid (sort key).

Click on Create.

This will create an empty table with 2 columns namely userid, jotid.

Adding sample data:

Each jot item is a json object with the following structure: 

{  
   "userid":"<32 digit uuid code>",
   "jotid":"<32 digit uuid code>",
   "title":"title content",
   "body":"body content"
}

Initially you will be shown only 2 keys userid, jotid as they are part of the primary key. We shall add the remaining 2 attributes (title, body). I have generated 5 random uuid codes for populating the table. We shall be adding 4 items, all of them belong to the same user which means they will have the same userid but different jotid. Click on Create item.

Similarly, add remaining 3 to the table.

We have successfully created a table and added 4 items to the table, now let us create Lambda functions to process data from DynamoDB.

Working with Lambda:

As we have learned earlier, Lambda lets us run code without provisioning or managing servers. We shall use Lambda for our server side computing or business logic in simple terms.

Creating a Lambda Function:
Let us first understand our server side operations that Jotter app needs.

  • Read a list of all items of a user from DynamoDB table.
  • Write a new item to DynamoDB table by a user.
  • Read the full content of an item of a user.
  • Delete an item of a user.

We shall create 4 Lambda functions that will do each of these jobs/operations.

Get all items Lambda:

Every Lambda has to be configured with an IAM role. This IAM role defines access control to other AWS services. We shall create an IAM role for Lambda to access DynamoDB.

Go to IAM > Roles. Choose create role. After role is created, we could attach policies or rules in simple terms. We could either attach default policies provided by AWS or create custom policies tailored to our specific needs.

Custom policy:
Choose service, and then operations. Create policy.

Create IAM policy for DynamoDB write access

This way you could add custom policies to the role or use default AWS policies. I am giving AmazonDynamoDBFullAccess policy as i will be using this role for all my Lambda functions. A better approach would be creating different roles with custom policies for specific operations. Please do exercise all options. Do not give permissions more than needed.

Once role is created, go back to Lambda console, create the Lambda function with that role.

  • Choose Author from scratch
  • Give function a name(get-all-jots)
  • Choose runtime environment(Nodejs 10x)
  • Permissions, choose existing role and add the role created earlier.
  • Create function.

After creation, replace index.js code with below:
In the Lambda code, I used the javascript aws-sdk to connect to DynamoDB.

get-all-jots Lambda:

var AWS = require('aws-sdk');
var ddb = new AWS.DynamoDB({region: 'your table region', apiVersion: '2012-08-10'});
exports.handler = (event,context,callback) => {
    console.log(event.userid);
    // TODO implement
   var params = {
  ExpressionAttributeNames: {
   "#T": "title",
   "#JI": "jotid"
  }, 
  ExpressionAttributeValues: {
   ":id": {
     S: ""+event.userid
    }
  }, 
  FilterExpression: "userid = :id", 
  ProjectionExpression: "#JI,#T", 
  TableName: "jotter"
 };
 ddb.scan(params, function(err, data) {
   if (err){
        console.log(err, err.stack); // an error occurred
        callback(err)
   } 
   else{
        console.log(data);
        let jots = [];
        data.Items.map((item) => {
         let jot = {
          title: item.title.S,
          jotid: item.jotid.S
         }
        jots.push(jot)
        })
        console.log("helloworld",jots);
        callback(null,jots);
   }    
 });
};

Testing get-all-jots Lambda function:
Choose to configure test events beside Test button. Here we configure the event object that is passed as parameter to Lambda function.

Save it, and test the Lambda function. You should see DynamoDB data in the response.

If you don’t see the results, enjoy debugging the logs in CloudWatch. You could navigate to logs from monitoring tab in Lambda console.

Similarly create the remaining 3 Lambda functions and test them by configuring event objects.

create-new-jot Lambda:

var AWS = require('aws-sdk');
var ddb = new AWS.DynamoDB({region: 'your table region', apiVersion: '2012-08-10'});

exports.handler = (event,context, callback) => {
    var params = {
      TableName: 'jotter',
      Item: {
        'userid' : {S: ''+event.userid},
        'title' : {S: ''+event.title},
        'body':{S: ''+event.body},
        'jotid':{S: ''+event.jotid}
      }
    };

ddb.putItem(params, function(err, data) {
  if (err) {
    console.log("Error", err);
    callback(err);
  } 
  else {
    console.log("Success", data);
    callback(null, data);
  }
});
};

Event object for test configuration: (create-new-jot)

{  
   "userid":"<32 digit uuid code>",
   "jotid":"<32 digit uuid code>",
   "title":"title content",
   "body":"body content"
}

get-jot Lambda:

var AWS = require('aws-sdk');
var ddb = new AWS.DynamoDB({region: 'your table region', apiVersion: '2012-08-10'});

exports.handler = (event, context,callback) => {
 let jotid = event.jotid;
 let userid = event.userid;
 
 var params = {
  Key: {
   "jotid": {S: ""+jotid},
   "userid": {S: ""+userid}
  }, 
  TableName: "jotter"
 };
 ddb.getItem(params, function(err, data) {
   if (err){
    console.log(err);
    callback(err)   
   }
   else{
    console.log(data);
     let jot = { 
      "userid": data.Item.userid.S,
      "title": data.Item.title.S,
      "body": data.Item.body.S,
      "jotid": data.Item.jotid.S
    }
    console.log(data);
    callback(null,jot)
   }
 });
};

Event object for test configuration: (get-jot)

{  
   "userid":"<32 digit uuid code>",
   "jotid":"<32 digit uuid code>",
}

delete-jot Lambda:

var AWS = require('aws-sdk');
var ddb = new AWS.DynamoDB({region: 'your table region', apiVersion: '2012-08-10'});

exports.handler = (event, context,callback) => {
 let jotid = event.jotid;
 let userid = event.userid;
 
 var params = {
  Key: {
   "jotid": {S: ""+jotid},
   "userid": {S: ""+userid}
  }, 
  TableName: "jotter"
 };
 
 ddb.deleteItem(params, function(err, data) {
  if (err) {
    console.log(err, err.stack);
    callback(err)
  }
  else {
    console.log(data);
    callback(null,data);
  }
});
};

Event object for test configuration: (delete-jot)

{  
   "userid":"<32 digit uuid code>",
   "jotid":"<32 digit uuid code>",
}

Test all the above Lambda functions, and see if the data you get in the response from callback is correct. For new jot check if the data is updated in the DynamoDB table. If not enjoy debugging the logs. We shall now see how we can invoke these functions from API gateway for each API/http-method call.

Working with API Gateway:

As we learned earlier, it makes it easy for developers to create, publish, manage, and secure APIs at any scale. It acts as a front door to backend services.

Before we start building let us understand our requirements and design our API accordingly.

  • Get a list of all jots of a user.
  • Post a new jot by a user.
  • Get the full content of a jot of a user.
  • Delete a jot of a user.

Accordingly our API request URIs & http methods could be something like this.

  • GET <URL endpoint>/alljots?userid=<id>
  • POST <URL endpoint>/newjot with jot content in body
  • GET <URL endpoint>/jot?userid=<id>&jotid=<id>
  • DELETE <URL endpoint>/jot?userid=<id>&jotid=<id>

Go to console > API Gateway > APIs > Create API

After creating the API, we could add any no. of resources(paths) to it. Let us add first resource.

Enable API Gateway CORS as it allows requests from different origins. Let us add the get method to it.

After adding the method, select the integration type to be Lambda function, fill in Lambda function details like region and name. When this URI is called this Lambda is invoked.

After saving, select GET, it opens method execution.

Understanding Method Execution:

  1. Client requests are passed to Method Request.
  2. Method Request acts like a gatekeeper it authenticates request, if not it rejects the request with 401 status code. Then it passes to Integration Request.
  3. Integration Request controls what goes into action(Lambda in our case). Data mapping & shaping can be done here. 
  4. Call the action with request data as params. 
  5. When action calls back with response, Integration Response handles it.
  6. Integration Response controls what comes out of action. It can also be used to map data to a required structure.
  7. Method Response is used to define Integration Response. 
  8. Method Response responds to the Client.

Adding a Data mapping template to Integration Request:

What mapping templates does is it maps data from request body or params into an object that is passed as event object to action (Lambda function). We do this only when we have to extract and get data into some specific structure our action needs.

Select Integration Request, open Mapping Templates.

After adding Data mapping template to Integration Request, let us test the API.

Testing the API:

Click on TEST in the Method Execution section. Enter the QueryStrings params as:

userid=”uuid”

Click on Test. You should be seeing all the jots of that userid in the Response Body.

Deploy API to a stage:

You could create different deployment stages, like dev, prod, testing etc. Everything we create or modify, for them to take effect, API has to be deployed to a stage. This gives us a URL endpoint, we could use to make API calls. For every change made, we have to deploy or the changes will be local to API Gateway and don’t reflect on the URL endpoint. Each stage has different URL endpoint with different resources.

Now we have successfully created an API with alljots path/resource.

Testing API using postman:

Launch postman, enter the URL, choose HTTP method, add appropriate headers and body if needed. Send the request.

Tada! API is working. Similarly add the remaining 3 URI’s in API Gateway and test them using postman.

Body mapping template for Delete jot and Get jot URI resources is the same, but we pass 2 arguments(userid, jotid) in query string.

Note: The reason why we passed data in query strings for get & delete methods is that these methods do not contain request body, the only means of passing data is by using path parameters or query strings. 

Data mapping template for Delete and Get jot URI resources:

Example for Get: (similarly for Delete)

For newjot post request, data has to be passed in the body in this format:

{  
   "userid":"<32 digit uuid code>",
   "jotid":"<32 digit uuid code>",
   "title":"title content",
   "body":"body content"
}

Example for Post:

Note: And again if things didn’t work properly, There can be many possibilities of why something isn’t working as we are dealing with 3 services here, my answer is start debugging the CloudWatch logs. 

We have successfully built and tested the AWS services part in our serverless architecture. We will look at how to integrate it with a react application, create Sign-in, Sign-up using Cognito and how to secure our API’s in API Gateway with Cognito and finally deploy the app on S3 cloud storage in Part2.

This story is authored by Koushik Busim. Koushik is a software engineer and a keen data science and machine learning enthusiast.

Machine Learning Operations (MLOps) Pipeline using Google Cloud Composer

In an earlier post, we had described the need for automating the Data Engineering pipeline for Machine Learning based systems. Today, we will expand the scope to setup a fully automated MLOps pipeline using Google Cloud Composer.

Cloud Composer

Cloud Composer is official defined as a fully managed workflow orchestration service that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers. Built on the popular Apache Airflow open source project and operated using the Python programming language, Cloud Composer is free from lock-in and easy to use.

So let’s get on with the required steps to create this MLOps infrastructure on Google Cloud Platform

Creating a Cloud Composer Environment

Step1: Please enable the Cloud Composer API.

Step2: Go to create environment page in GCP console. Composer is available in Big Data section.

Step3: Click on create to start creating a Composer environment

Step4: Please select the Service account which has the required permissions to access GCS, Big Query, ML Engine and  Composer environment. The required roles for accessing Composer environment is Composer Administrator and Composer Worker. 
For more details about access control in Composer environment please see this.

Step5: Please use Python Version 3 and latest Image version.

Step6: Click on create. It will take about 15-20 minutes to create the environment. Once it completes, the environment page shall look like the following.

Click on Airflow to see Airflow WebUI. The Airflow WebUI looks as follows

DAGs folder is where our dag file is stored. DAG folder is nothing but a folder inside a GCS bucket which is created by the environment. To know more about the concept of DAG and general introduction to Airflow, please refer to this post.

You could see Composer related logs in Logging.

Step7: Please add the following PyPI packages in Composer environment.

Click on created environment and navigate to PYPI packages and click on edit to add packages

The required packages are:

# to read data from MongoDB
pymongo==3.8.0
oauth2client==4.1.3
# to read data from firestore
google-cloud-firestore==1.3.0
firebase-admin==2.17.0
google-api-core==1.13.0

Create a ML model

Step1: Please create a folder structure like the following on your instance.

ml_model
├── setup.py
└── trainer
    ├── __init__.py
    └── train.py

Step2: Please place the following code in train.py file, which shall upload the model to GCS bucket as shown below. This model would be used to create model versions as explained a bit later.

from google.cloud import bigquery
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import numpy as np
from google.cloud import storage
import datetime
import json
import pickle
client = bigquery.Client()
sql = '''
SELECT *
FROM `<PROJECT_ID>.<DATASET>.<TABLENAME>`
'''

df = client.query(sql).to_dataframe()
df = df[['is_stressed', 'is_engaged', 'status']]

df['is_stressed'] = df['is_stressed'].fillna('n')
df['is_engaged'] = df['is_engaged'].fillna('n')
df['stressed'] = np.where(df['is_stressed']=='y', 1, 0)
df['engaged'] = np.where(df['is_engaged']=='y', 1, 0)
df['status'] = np.where(df['status']=='complete', 1, 0)

feature_cols = ['stressed', 'engaged']
X = df[feature_cols]
y = df.status
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=0)
logreg = LogisticRegression()
logreg.fit(X_train,y_train)
pkl_filename = "model.pkl"  
with open(pkl_filename, 'wb') as file:  
    pickle.dump(logreg, file)
BUCKET_NAME=BUCKET_NAME# Upload the model to GCS
bucket = storage.Client().bucket(BUCKET_NAME)
file_path = datetime.datetime.now().strftime('machine_learning/models/%Y%m%d_%H%M%S')
blob = bucket.blob('{}/{}'.format(
    file_path,
    pkl_filename))
blob.upload_from_filename(pkl_filename)

file_location = 'gs://{BUCKET_NAME}/{file_path}'.format(BUCKET_NAME=BUCKET_NAME, file_path=file_path)
file_config = json.dumps({'file_location': file_location})

bucket = storage.Client().bucket(COMPOSER_BUCKET)
blob = bucket.blob('data/file_config.json')
blob.upload_from_string(file_config)

Step3: Create an empty init.py file inside the trainer directory.

Step4: Please place the following code in setup.py file. The setup.py file contains required packages to execute code.

import setuptools

REQUIRED_PACKAGES = [
    'pandas-gbq==0.3.0',
    'cloudml-hypertune',
    'google-cloud-bigquery==1.14.0',
    'urllib3'
]

setuptools.setup(
    name='ml_model',
    version='1.0',
    install_requires=REQUIRED_PACKAGES,
    packages=setuptools.find_packages(),
    include_package_data=True,
    description='',
)

Step5: Packaging the code using the following command. It creates a gz file inside ml_model directory.

python3 setup.py sdist

Step6: The package name is the name that is specified in setup.py file. The package name becomes ml_model-1.0.tar.gz
Copy the package to gs://{your-GCS-bucket}/machine_learning/. This becomes the base directory for your machine learning activities described in this post.

Creating a DAG

In this use case, we have created a DAG file which exports some table data from a MongoDB instance into a GCS bucket and then creates a BigQuery table off of that exported data. It trains a model and creates version for that model. The DAG file supports full data extraction and daily data extraction explained in the code below using a variable tot_data. This variable is extracted from Airflow configurations set by the user. This process is also described later in this post.

Please place the following code in the DAG file.

import airflow
from airflow import DAG
from airflow.models import Variable
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime
from airflow.operators.python_operator import PythonOperator
import pprint
import json
import re

from pymongo import MongoClient
from google.cloud import storage
from google.cloud.storage import blob
from google.cloud import storage
import os

from airflow import models
from mlengine_operator import MLEngineTrainingOperator, MLEngineVersionOperator

ts = datetime.now()
today = str(ts.date()) + 'T00:00:00.000Z'
yester_day = str(ts.date() - timedelta(days = 1)) + 'T00:00:00.000Z'

str_ts = ts.strftime('%Y_%m_%d_%H_%m_%S')

config = Variable.get("mongo_conf", deserialize_json=True)
host = config['host']
db_name = config['db_name']
table_name = config['table_name']
file_prefix = config['file_prefix']
bucket_name = config['bucket_name']
# file_path = file_prefix + '/' + table_name + '.json'
file_path = '{file_prefix}/{table_name}/{table_name}_{str_ts}.json'.format(file_prefix=file_prefix, str_ts=str_ts, table_name=table_name)
file_location = 'gs://' + bucket_name + '/' + file_prefix + '/' + table_name + '/' + table_name + '_*.json'
config['file_location'] = file_location
bq_dataset = config['bq_dataset']
tot_data = config['tot_data'].lower()

BUCKET_NAME = config['ml_configuration']['BUCKET_NAME']
BASE_DIR = config['ml_configuration']['BASE_DIR']
PACKAGE_NAME = config['ml_configuration']['PACKAGE_NAME']
TRAINER_BIN = os.path.join(BASE_DIR, 'packages', PACKAGE_NAME)
TRAINER_MODULE = config['ml_configuration']['TRAINER_MODULE']
RUNTIME_VERSION = config['ml_configuration']['RUNTIME_VERSION']
PROJECT_ID = config['ml_configuration']['PROJECT_ID']
MODEL_NAME = config['ml_configuration']['MODEL_NAME']

MODEL_FILE_BUCKET = config['ml_configuration']['MODEL_FILE_BUCKET']
model_file_loc = config['ml_configuration']['MODEL_FILE_LOCATION']

bucket = storage.Client().bucket(MODEL_FILE_BUCKET)
blob = bucket.get_blob(model_file_loc)
file_config = json.loads(blob.download_as_string().decode("utf-8"))
export_uri = file_config['file_location']

def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

def mongoexport():
        client = storage.Client()
        bucket = client.get_bucket(bucket_name)
        blob = bucket.blob(file_path)

        client = MongoClient(host)
        db = client[db_name]
        tasks = db[table_name]
        pprint.pprint(tasks.count_documents({}))
        # if tot_data is set to 'yes' in airflow configurations, full data 
        # is processed.  
        if tot_data == 'no':
          query = {"edit_datetime": { "$gte": yester_day, "$lt": today}}
          print(query)
          data = tasks.find(query)
        else:
          data = tasks.find()
        emp_list = []
        for record in data:
                emp_list.append(json.dumps(record, default=str))
        flat_list =[]
        for data in emp_list:
                flat_list.append((flatten_json(json.loads(data))))
        data = '\n'.join(json.dumps({re.sub('[^0-9a-zA-Z_ ]+', '', str(k)).lower().replace(' ', '_'): str(v) for k, v in record.items()}) for record in flat_list)
        blob.upload_from_string(data)

default_args = {
    'start_date': airflow.utils.dates.days_ago(0),
    'retries': 1,
    'retry_delay': timedelta(minutes=5)
}

with DAG('ml_pipeline', schedule_interval=None, default_args=default_args) as dag:

    # priority_weight has type int in Airflow DB, uses the maximum.
    pymongo_export_op = PythonOperator(
        task_id='pymongo_export',
        python_callable=mongoexport,
        )

    update_bq_table_op = BashOperator(
        task_id='update_bq_table',
        bash_command='''
        bq rm -f {bq_dataset}.{table_name}
        bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON --ignore_unknown_values=True {bq_dataset}.{table_name} {file_location}
        '''.format(bq_dataset=bq_dataset, table_name=table_name, file_location=file_location)
        )

    date_nospecial = '{{ execution_date.strftime("%Y%m%d") }}'
    date_min_nospecial = '{{ execution_date.strftime("%Y%m%d_%H%m") }}'
    uuid = '{{ macros.uuid.uuid4().hex[:8] }}'

    training_op = MLEngineTrainingOperator(
      task_id='submit_job_for_training',
      project_id=PROJECT_ID,
      job_id='{}_{}_{}'.format(table_name, date_nospecial, uuid),
      package_uris=[os.path.join(TRAINER_BIN)],
      training_python_module=TRAINER_MODULE,
      training_args=[
          '--base-dir={}'.format(BASE_DIR),
          '--event-date={}'.format(date_nospecial),
      ],
      region='us-central1',
      runtime_version=RUNTIME_VERSION,
      python_version='3.5')

    create_version_op = MLEngineVersionOperator(
      task_id='create_version',
      project_id=PROJECT_ID,
      model_name=MODEL_NAME,
      version={
          'name': 'version_{}_{}'.format(date_min_nospecial, uuid),
          'deploymentUri': export_uri,
          'runtimeVersion': RUNTIME_VERSION,
          'pythonVersion': '3.5',
          'framework': 'SCIKIT_LEARN',
      },
      operation='create')

    pymongo_export_op >> update_bq_table_op >> training_op >> create_version_op

Once file is created, please upload the file to DAGs folder. And also please add the following plugin dependency file named mlengine_operator in DAGs folder.
Place the following code in mlengine_operator.py file.

import re

from apiclient import errors

from airflow.contrib.hooks.gcp_mlengine_hook import MLEngineHook
from airflow.exceptions import AirflowException
from airflow.operators import BaseOperator
from airflow.utils.decorators import apply_defaults
from airflow.utils.log.logging_mixin import LoggingMixin

log = LoggingMixin().log


def _normalize_mlengine_job_id(job_id):

    # Add a prefix when a job_id starts with a digit or a template
    match = re.search(r'\d|\{{2}', job_id)
    if match and match.start() is 0:
        job = 'z_{}'.format(job_id)
    else:
        job = job_id

    # Clean up 'bad' characters except templates
    tracker = 0
    cleansed_job_id = ''
    for m in re.finditer(r'\{{2}.+?\}{2}', job):
        cleansed_job_id += re.sub(r'[^0-9a-zA-Z]+', '_',
                                  job[tracker:m.start()])
        cleansed_job_id += job[m.start():m.end()]
        tracker = m.end()

    # Clean up last substring or the full string if no templates
    cleansed_job_id += re.sub(r'[^0-9a-zA-Z]+', '_', job[tracker:])

    return cleansed_job_id


class MLEngineBatchPredictionOperator(BaseOperator):
   
    template_fields = [
        '_project_id',
        '_job_id',
        '_region',
        '_input_paths',
        '_output_path',
        '_model_name',
        '_version_name',
        '_uri',
    ]

    @apply_defaults
    def __init__(self,
                 project_id,
                 job_id,
                 region,
                 data_format,
                 input_paths,
                 output_path,
                 model_name=None,
                 version_name=None,
                 uri=None,
                 max_worker_count=None,
                 runtime_version=None,
                 gcp_conn_id='google_cloud_default',
                 delegate_to=None,
                 *args,
                 **kwargs):
        super(MLEngineBatchPredictionOperator, self).__init__(*args, **kwargs)

        self._project_id = project_id
        self._job_id = job_id
        self._region = region
        self._data_format = data_format
        self._input_paths = input_paths
        self._output_path = output_path
        self._model_name = model_name
        self._version_name = version_name
        self._uri = uri
        self._max_worker_count = max_worker_count
        self._runtime_version = runtime_version
        self._gcp_conn_id = gcp_conn_id
        self._delegate_to = delegate_to

        if not self._project_id:
            raise AirflowException('Google Cloud project id is required.')
        if not self._job_id:
            raise AirflowException(
                'An unique job id is required for Google MLEngine prediction '
                'job.')

        if self._uri:
            if self._model_name or self._version_name:
                raise AirflowException('Ambiguous model origin: Both uri and '
                                       'model/version name are provided.')

        if self._version_name and not self._model_name:
            raise AirflowException(
                'Missing model: Batch prediction expects '
                'a model name when a version name is provided.')

        if not (self._uri or self._model_name):
            raise AirflowException(
                'Missing model origin: Batch prediction expects a model, '
                'a model & version combination, or a URI to a savedModel.')

    def execute(self, context):
        job_id = _normalize_mlengine_job_id(self._job_id)
        prediction_request = {
            'jobId': job_id,
            'predictionInput': {
                'dataFormat': self._data_format,
                'inputPaths': self._input_paths,
                'outputPath': self._output_path,
                'region': self._region
            }
        }

        if self._uri:
            prediction_request['predictionInput']['uri'] = self._uri
        elif self._model_name:
            origin_name = 'projects/{}/models/{}'.format(
                self._project_id, self._model_name)
            if not self._version_name:
                prediction_request['predictionInput'][
                    'modelName'] = origin_name
            else:
                prediction_request['predictionInput']['versionName'] = \
                    origin_name + '/versions/{}'.format(self._version_name)

        if self._max_worker_count:
            prediction_request['predictionInput'][
                'maxWorkerCount'] = self._max_worker_count

        if self._runtime_version:
            prediction_request['predictionInput'][
                'runtimeVersion'] = self._runtime_version

        hook = MLEngineHook(self._gcp_conn_id, self._delegate_to)

        # Helper method to check if the existing job's prediction input is the
        # same as the request we get here.
        def check_existing_job(existing_job):
            return existing_job.get('predictionInput', None) == \
                prediction_request['predictionInput']

        try:
            finished_prediction_job = hook.create_job(
                self._project_id, prediction_request, check_existing_job)
        except errors.HttpError:
            raise

        if finished_prediction_job['state'] != 'SUCCEEDED':
            self.log.error('MLEngine batch prediction job failed: {}'.format(
                str(finished_prediction_job)))
            raise RuntimeError(finished_prediction_job['errorMessage'])

        return finished_prediction_job['predictionOutput']


class MLEngineModelOperator(BaseOperator):
    template_fields = [
        '_model',
    ]

    @apply_defaults
    def __init__(self,
                 project_id,
                 model,
                 operation='create',
                 gcp_conn_id='google_cloud_default',
                 delegate_to=None,
                 *args,
                 **kwargs):
        super(MLEngineModelOperator, self).__init__(*args, **kwargs)
        self._project_id = project_id
        self._model = model
        self._operation = operation
        self._gcp_conn_id = gcp_conn_id
        self._delegate_to = delegate_to

    def execute(self, context):
        hook = MLEngineHook(
            gcp_conn_id=self._gcp_conn_id, delegate_to=self._delegate_to)
        if self._operation == 'create':
            return hook.create_model(self._project_id, self._model)
        elif self._operation == 'get':
            return hook.get_model(self._project_id, self._model['name'])
        else:
            raise ValueError('Unknown operation: {}'.format(self._operation))


class MLEngineVersionOperator(BaseOperator):
    
    template_fields = [
        '_model_name',
        '_version_name',
        '_version',
    ]

    @apply_defaults
    def __init__(self,
                 project_id,
                 model_name,
                 version_name=None,
                 version=None,
                 operation='create',
                 gcp_conn_id='google_cloud_default',
                 delegate_to=None,
                 *args,
                 **kwargs):

        super(MLEngineVersionOperator, self).__init__(*args, **kwargs)
        self._project_id = project_id
        self._model_name = model_name
        self._version_name = version_name
        self._version = version or {}
        self._operation = operation
        self._gcp_conn_id = gcp_conn_id
        self._delegate_to = delegate_to

    def execute(self, context):
        if 'name' not in self._version:
            self._version['name'] = self._version_name

        hook = MLEngineHook(
            gcp_conn_id=self._gcp_conn_id, delegate_to=self._delegate_to)

        if self._operation == 'create':
            assert self._version is not None
            return hook.create_version(self._project_id, self._model_name,
                                       self._version)
        elif self._operation == 'set_default':
            return hook.set_default_version(self._project_id, self._model_name,
                                            self._version['name'])
        elif self._operation == 'list':
            return hook.list_versions(self._project_id, self._model_name)
        elif self._operation == 'delete':
            return hook.delete_version(self._project_id, self._model_name,
                                       self._version['name'])
        else:
            raise ValueError('Unknown operation: {}'.format(self._operation))


class MLEngineTrainingOperator(BaseOperator):
    
    template_fields = [
        '_project_id',
        '_job_id',
        '_package_uris',
        '_training_python_module',
        '_training_args',
        '_region',
        '_scale_tier',
        '_runtime_version',
        '_python_version',
        '_job_dir'
    ]

    @apply_defaults
    def __init__(self,
                 project_id,
                 job_id,
                 package_uris,
                 training_python_module,
                 training_args,
                 region,
                 scale_tier=None,
                 runtime_version=None,
                 python_version=None,
                 job_dir=None,
                 gcp_conn_id='google_cloud_default',
                 delegate_to=None,
                 mode='PRODUCTION',
                 *args,
                 **kwargs):
        super(MLEngineTrainingOperator, self).__init__(*args, **kwargs)
        self._project_id = project_id
        self._job_id = job_id
        self._package_uris = package_uris
        self._training_python_module = training_python_module
        self._training_args = training_args
        self._region = region
        self._scale_tier = scale_tier
        self._runtime_version = runtime_version
        self._python_version = python_version
        self._job_dir = job_dir
        self._gcp_conn_id = gcp_conn_id
        self._delegate_to = delegate_to
        self._mode = mode

        if not self._project_id:
            raise AirflowException('Google Cloud project id is required.')
        if not self._job_id:
            raise AirflowException(
                'An unique job id is required for Google MLEngine training '
                'job.')
        if not package_uris:
            raise AirflowException(
                'At least one python package is required for MLEngine '
                'Training job.')
        if not training_python_module:
            raise AirflowException(
                'Python module name to run after installing required '
                'packages is required.')
        if not self._region:
            raise AirflowException('Google Compute Engine region is required.')

    def execute(self, context):
        job_id = _normalize_mlengine_job_id(self._job_id)
        training_request = {
            'jobId': job_id,
            'trainingInput': {
                'scaleTier': self._scale_tier,
                'packageUris': self._package_uris,
                'pythonModule': self._training_python_module,
                'region': self._region,
                'args': self._training_args,
            }
        }

        if self._runtime_version:
            training_request['trainingInput']['runtimeVersion'] = self._runtime_version

        if self._python_version:
            training_request['trainingInput']['pythonVersion'] = self._python_version

        if self._job_dir:
            training_request['trainingInput']['jobDir'] = self._job_dir

        if self._mode == 'DRY_RUN':
            self.log.info('In dry_run mode.')
            self.log.info('MLEngine Training job request is: {}'.format(
                training_request))
            return

        hook = MLEngineHook(
            gcp_conn_id=self._gcp_conn_id, delegate_to=self._delegate_to)

        # Helper method to check if the existing job's training input is the
        # same as the request we get here.
        def check_existing_job(existing_job):
            return existing_job.get('trainingInput', None) == \
                training_request['trainingInput']

        try:
            finished_training_job = hook.create_job(
                self._project_id, training_request, check_existing_job)
        except errors.HttpError:
            raise

        if finished_training_job['state'] != 'SUCCEEDED':
            self.log.error('MLEngine training job failed: {}'.format(
                str(finished_training_job)))
            raise RuntimeError(finished_training_job['errorMessage'])

Import variables from composer_conf.json file into Airflow Variables.
Go to Airflow WebUI → Admin → Variables → Browse to file path or configure variables manually.
Please place the following in composer_conf

{
  "mongo_conf": {
    "host": "mongodb://<instance-internal-ip>:27017",
    "db_name": "DBNAME",
    "table_name": "TABLENAME",
    "file_prefix": "Folder In GCS Bucket",
    "bq_dataset": "BigQuery Dataset",
    "bucket_name": "GCS Bucket",
    "tot_data": "yes",
    "ml_configuration": {
      "BUCKET_NAME": "GCS Bucket",
      "BASE_DIR": "gs://GCS Bucket/machine_learning/",
      "PACKAGE_NAME": "PACKAGE NAME FROM setup.py FILE in ML",
      "TRAINER_MODULE": "trainer.train",
      "RUNTIME_VERSION": "1.13",
      "PROJECT_ID": "GCP Project",
      "MODEL_FILE_BUCKET": "BUCKET CREATED BY Composer Environment",
      "MODEL_FILE_LOCATION": "data/MODEL LOCATION FILE",
      "MODEL_NAME": "MODEL_NAME"
    }
  }

Please store any configuration files or credentials file that are used by Composer in the data folder in the bucket created by Composer environment.

After configuring variables accordingly, you can see the DAG named ml_pipeline in the Airflow WebUI.

Please trigger the DAG file from Airflow WebUI. Once the DAG ran successfully. It looks like the following:

Thanks for the read and look forward to your comments.

This story is authored by PV Subbareddy. Subbareddy is a Big Data Engineer specializing on Cloud Big Data Services and Apache Spark Ecosystem.

Email Deliverability Analytics using SendGrid and AWS Big Data Services

Email Deliverability Analytics using SendGrid and AWS Big Data Services

In this post, we will run though a case study to setup an email deliverability analytics pipeline using SendGrid and AWS Big Data Services such as S3, Glue and Athena. To start off, when we send mails from SendGrid to recipients. we get responses (multiple response types are possible such as processed, delivered, blocked, deferred etc) from Email Service Providers such as gmail, yahoo etc. We could use this response data to improve our Email Deliverability by analyzing this email response data. This is achieved by logging these responses (via API Gateway and Lambda function) into Amazon S3 and then analyzing them using Athena. The chain of events is put in place by using a web hook that triggers a post request to AWS API Gateway on an event notification (response) from SendGrid. The API Gateway is further configured to trigger a Lambda Function which writes the email response data into S3. We then use Glue crawler to update the metadata in data catalogue, thereby making it available for Athena to perform SQL based analysis.

Without further ado, let’s set the ball rolling. Go to SendGrid and select Settings>Mail_Settings. Click on Event Notifications

We are gonna enable it by giving an Endpoint and select the Events for which you want to get a response. 

The above endpoint points to the AWS API Gateway (shown below) which is a POST request and it triggers the Lambda function as you can see.

Now our Lambda function stores the event payload data in S3 Bucket
Lambda code:

const AWS = require('aws-sdk')
    var s3Bucket = new AWS.S3( { params: {Bucket: "Your-Bucket"} } );
    
    exports.handler = (event, context, callback) => {
        console.log(event); // the response data
        let x = "";
        event.map((item)=>{
            x = x + JSON.stringify(item) + "\n"
        }) 
        let uuid = create_UUID();
        var filePath = "receivelogs/"+uuid;
        console.log(filePath);
        var data = {
            Key: filePath, 
            Body: x
        };
        s3Bucket.putObject(data, function(err, data){
            if (err) { 
                console.log('Error uploading data: ', data);
                callback(err, null);
            } else {
                console.log('Successfully uploaded the response');
                callback(null, data);
            }
        });
};
// this function will generate Unique User ID. Used as FileName
function create_UUID(){
   var dt = new Date().getTime();
   var uuid = 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {
       var r = (dt + Math.random()*16)%16 | 0;
       dt = Math.floor(dt/16);
       return (c=='x' ? r :(r&0x3|0x8)).toString(16);
   });
   return uuid;
}

When you send mail, the response is triggered from SendGrid via POST request to API Gateway and then the response gets stored in S3 via Lambda function.

AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores. We use a crawler to populate the AWS Glue Data Catalog with tables. Below is the step-by-step process to setup the Glue crawler to read an S3 based data source and make it available as a database table for AWS Athena based analytics.

In the step above, you may need to create a new IAM role that provides access to the underlying S3 data.

So in the steps above, we have concluded the setup for the crawler to fetch the underlying data on S3.

When you run this crawler on the S3 based data source, it updates the metadata of objects in that path in Glue data catalogue. Now, Athena can query ( SQL operations) those objects in S3 using metadata available in data catalogue. A lot of business executives aren’t comfortable with SQL queries, perhaps an add-on to this data pipeline could be using AWS Quicksight for a more BI driven analysis.

Thanks for the read!

This story is authored by Santosh Kumar. He is an AWS Cloud Engineer.

Face Recognition App In React Native using AWS Rekognition

In this blog we are going to build an app for registering faces and verifying faces using Amazon Rekognition in React Native.

Installing dependencies:

Let’s go to React Native Docs, select React Native CLI Quickstart and select our appropriate Development OS and the Target OS as Android, as we are going to build an android application.

Follow the docs for installing dependencies, after installing create a new React Native Application. Use the command line interface to generate a new React Native project called FaceRegister.

react-native init FaceRegister

Preparing the Android device:

We shall need an Android device to run our React Native Android app. This can be either a physical Android device, or more commonly, we can use an Android Virtual Device (AVD) which allows us to emulate an Android device on our computer (using Android Studio).

Either way, we shall need to prepare the device to run Android apps for development.
If you have a physical Android device, you can use it for development in place of an AVD by connecting it to your computer using a USB cable and following the instructions here.

If you are using a virtual device follow this link. I shall be using physical android device.
Now go to the command line and run react-native run-android inside your React Native app directory:

cd FaceRegister
react-native run-android

If everything is set up correctly, you should see your new app running in your physical device or Android emulator.

In your system, you should see a folder named FaceRegister created. Now open FaceRegister folder with your favorite code editor and create a file called Register.js. We need an input box for the username or id for referring the image and a placeholder to preview the captured image and a submit button to register.

Open your Register.js file and copy the below code:

import React from 'react';
import { StyleSheet, View, Text, TextInput, Image, ScrollView, TouchableHighlight } from 'react-native';

class LoginScreen extends React.Component {
    constructor(props){
       super(props);
       this.state =  {
           username : '',
           capturedImage : ''
       };
   }

  
   render() {
       return (
           <View style={styles.MainContainer}>
               <ScrollView>
                   <Text style= {{ fontSize: 20, color: "#000", textAlign: 'center', marginBottom: 15, marginTop: 10 }}>Register Face</Text>
              
                   <TextInput
                       placeholder="Enter Username"
                       onChangeText={UserName => this.setState({username: UserName})}
                       underlineColorAndroid='transparent'
                       style={styles.TextInputStyleClass}
                   />
                   {this.state.capturedImage !== "" && <View style={styles.imageholder} >
                       <Image source={{uri : this.state.capturedImage}} style={styles.previewImage} />
                   </View>}
                  

                   <TouchableHighlight style={[styles.buttonContainer, styles.captureButton]}>
                       <Text style={styles.buttonText}>Capture Image</Text>
                   </TouchableHighlight>

                   <TouchableHighlight style={[styles.buttonContainer, styles.submitButton]}>
                       <Text style={styles.buttonText}>Submit</Text>
                   </TouchableHighlight>
               </ScrollView>
           </View>
       );
   }
}

const styles = StyleSheet.create({
   MainContainer: {
       marginTop: 60
   },
   TextInputStyleClass: {
     textAlign: 'center',
     marginBottom: 7,
     height: 40,
     borderWidth: 1,
     margin: 10,
     borderColor: '#D0D0D0',
     borderRadius: 5 ,
   },
   inputContainer: {
     borderBottomColor: '#F5FCFF',
     backgroundColor: '#FFFFFF',
     borderRadius:30,
     borderBottomWidth: 1,
     width:300,
     height:45,
     marginBottom:20,
     flexDirection: 'row',
     alignItems:'center'
   },
   buttonContainer: {
     height:45,
     flexDirection: 'row',
     alignItems: 'center',
     justifyContent: 'center',
     marginBottom:20,
     width:"80%",
     borderRadius:30,
     marginTop: 20,
     marginLeft: 5,
   },
   captureButton: {
     backgroundColor: "#337ab7",
     width: 350,
   },
   buttonText: {
     color: 'white',
     fontWeight: 'bold',
   },
   horizontal: {
     flexDirection: 'row',
     justifyContent: 'space-around',
     padding: 10
   },
   submitButton: {
     backgroundColor: "#C0C0C0",
     width: 350,
     marginTop: 5,
   },
   imageholder: {
     borderWidth: 1,
     borderColor: "grey",
     backgroundColor: "#eee",
     width: "50%",
     height: 150,
     marginTop: 10,
     marginLeft: 90,
     flexDirection: 'row',
     alignItems:'center'
   },
   previewImage: {
     width: "100%",
     height: "100%",
   }
});

export default LoginScreen;

Now import your Register file in your App.js file which is located in your project root folder. Open your App.js file and replace it with the below code:

import React, {Component} from 'react';
import {View} from 'react-native';
import LoginScreen from './LoginScreen';

class App extends Component {
   render() {
       return (
       <View>
           <LoginScreen />
       </View>
       );
   }
}

export default App;

Now run your app again. Run below command in the project directory:

react-native run-android

You can see a Text input for username and two buttons one(Capture image) for capturing an image and another(Submit) for submitting the details as shown below:

Let’s add the functionality to preview the captured image. We have a package called react-native-image-picker that enables to capture a picture from the device’s camera or to upload an image from the gallery. Go to the command line, in the project directory run the below command to install react-native-image-picker library:

yarn add react-native-image-picker || npm install --save react-native-image-picker

react-native link react-native-image-picker

Add the required permissions in the AndroidManifest.xml file which is located at android/app/src/main/:

<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"/>

For more information about this package follow this link.
Now add the below code in your Register.js file.

import React from 'react';
...
...
import ImagePicker from "react-native-image-picker"; //import this

class LoginScreen extends React.Component {
    constructor(props){
      ...
   }

//Add the below method...

   captureImageButtonHandler = () => {
       ImagePicker.showImagePicker({title: "Pick an Image", maxWidth: 800, maxHeight: 600}, (response) => {
           console.log('Response = ', response);
           // alert(response)
           if (response.didCancel) {
               console.log('User cancelled image picker');
           } else if (response.error) {
               console.log('ImagePicker Error: ', response.error);
           } else if (response.customButton) {
               console.log('User tapped custom button: ', response.customButton);
           } else {
               // You can also display the image using data:
               const source = { uri: 'data:image/jpeg;base64,' + response.data };
          
               this.setState({capturedImage: response.uri, base64String: source.uri });
           }
       });
   }
  
   render() {
       return (
           <View style={styles.MainContainer}>
               ...
               ...
             // Add onPress property to capture image button //
           
              <TouchableHighlight style={[styles.buttonContainer, styles.loginButton]} onPress={this.captureImageButtonHandler}>
                       <Text style={styles.loginText}>Capture Image</Text>
               </TouchableHighlight>
              ...
              ...   
           </View>
       );
   }
}

const styles = StyleSheet.create({
...
...
...

});

export default LoginScreen;

Add the captureImageButtonHandler() method in the file and add the onPress property to Capture Image button to call this method. After updating the code, reload your app. Now you can access your camera and gallery by clicking on the Capture Image button. Once you capture an image you can see the preview of that image on your screen as below:

Now we need to register the captured image by storing it in S3 bucket.

I have created an API in API Gateway from AWS console which invokes a lambda function (register-face). All I have to do is to send a POST request to the API url endpoint from client-side.

In the below image I created two resources one for adding faces and another for searching face:

This is the lambda function called register-face that is invoked when we click on the Submit button.

const AWS = require('aws-sdk')
var rekognition = new AWS.Rekognition()
var s3Bucket = new AWS.S3( { params: {Bucket: "<bucket-name>"} } );
var fs = require('fs');

exports.handler = (event, context, callback) => {
    let parsedData = JSON.parse(event)
    let encodedImage = parsedData.Image;
    var filePath = "registered/" + parsedData.name;
    console.log(filePath)
    let decodedImage = new Buffer(encodedImage.replace(/^data:image\/\w+;base64,/, ""),'base64')
    var data = {
        Key: filePath, 
        Body: decodedImage,
        ContentEncoding: 'base64',
        ContentType: 'image/jpeg'
    };
    s3Bucket.putObject(data, function(err, data){
        if (err) { 
            console.log('Error uploading data: ', data);
            callback(err, null);
        } else {
            console.log('succesfully uploaded the image!');
            callback(null, data);
        }
    });
};

In the above code, I am storing the image in the registered folder (prefix) in the S3 bucket.

Just uploading faces in S3 bucket is not enough, we need to create a collection in an AWS region to store the registered faces from S3 bucket. Because we also need to add the verification or recognition process whether the face is registered or not. For that, we shall be using Amazon Rekognition to search faces in the collection. In Amazon Rekognition there is an operation called SearchFacesByImage which searches the image from the collection. Go through the Searching Faces in a Collection to know more. 
Add the below code to the register-face lambda function.

var params ={
        CollectionId: "<collection-id>", 
        DetectionAttributes: [], 
        ExternalImageId: parsedData.name, 
        Image: {
            S3Object: {
                Bucket: "<bucket-name>", 
                Name: filePath
            }
        }
    }
    setTimeout(function () {
        rekognition.indexFaces(params, function(err, data) {
            if (err){
                console.log(err, err.stack); // an error occurred
                callback(err)
            }
            else{
                console.log(data); // successful response
                callback(null,data);
            }
        });
    }, 3000);

So, the final lambda function looks as below:

const AWS = require('aws-sdk')
var rekognition = new AWS.Rekognition()
var s3Bucket = new AWS.S3( { params: {Bucket: "<bucket-name>"} } );
var fs = require('fs');

exports.handler = (event, context, callback) => {
    console.log(event);
    console.log(typeof event);
    console.log(JSON.parse(event));
    let parsedData = JSON.parse(event)
    let encodedImage = parsedData.Image;
    var filePath = "registered/" + parsedData.name;
    console.log(filePath)
    let buf = new Buffer(encodedImage.replace(/^data:image\/\w+;base64,/, ""),'base64')
    var data = {
        Key: filePath, 
        Body: buf,
        ContentEncoding: 'base64',
        ContentType: 'image/jpeg'
    };
    s3Bucket.putObject(data, function(err, data){
        if (err) { 
            console.log('Error uploading data: ', data);
            callback(err, null);
        } else {
            console.log('succesfully uploaded the image!');
            // callback(null, data);
        }
    });
    var params ={
        CollectionId: "face-collection", 
        DetectionAttributes: [], 
        ExternalImageId: parsedData.name, 
        Image: {
            S3Object: {
                Bucket: "face-recognise-test", 
                Name: filePath
            }
        }
    }
    setTimeout(function () {
        rekognition.indexFaces(params, function(err, data) {
            if (err){
                console.log(err, err.stack); // an error occurred
                callback(err)
            }
            else{
                console.log(data);           // successful response
                callback(null,data);
            }
        });
    }, 3000);
};

In this lambda function initially, I’m storing the image(face) in S3 bucket and then adding the same face to collection from S3 bucket.

Let’s get back to our client-side and install the aws-amplify library in our project root directory from the command line with below commands:

npm install --save aws-amplify
npm install --save aws-amplify-react-native
(or)
yarn add aws-amplify
yarn add aws-amplify-react-native

Now add the below code in Register.js file:

import React from 'react';
...
...
import Amplify, {API} from "aws-amplify";
Amplify.configure({
   API: {
       endpoints: [
           {
               name: "<API-name>",
               endpoint: "<your endpoint url>"
           }
       ]
   }
});

class Registration extends React.Component {
    constructor(props){
      ...
      ...
    }
   submitButtonHandler = () => {
       if (this.state.username == '' || this.state.username == undefined || this.state.username == null) {
           alert("Please Enter the Username");
       } else if (this.state.userId == '' || this.state.userId == undefined || this.state.userId == null) {
           alert("Please Enter the UserId");
       } else if(this.state.capturedImage == '' || this.state.capturedImage == undefined || this.state.capturedImage == null) {
           alert("Please Capture the Image");
       } else {
           const apiName = "<API-name>";
           const path = "<your path>";
           const init = {
               headers : {
                   'Accept': 'application/json',
                   "X-Amz-Target": "RekognitionService.IndexFaces",
                   "Content-Type": "application/x-amz-json-1.1"
               },
               body : JSON.stringify({
                   Image: this.state.base64String,
                   name: this.state.username
               })
           }
          
           API.post(apiName, path, init).then(response => {
               alert(JSON.stringify(response))
           });
       }
   }
   
   render() {
       if(this.state.image!=="") {
           // alert(this.state.image)
       }
       return (
           <View style={styles.MainContainer}>
               <ScrollView>
...
...

                   <TouchableHighlight style={[styles.buttonContainer, styles.signupButton]} onPress={this.submitButtonHandler}>
                       <Text style={styles.signupText}>Submit</Text>
                   </TouchableHighlight>
...
...   
            </ScrollView>
           </View>
       );
   }
}

In the above code, we added configuration to the API Gateway using amplify and created a method called submitButtonHandler() where we are going to do a POST request to the lambda function to register the face when the user clicks on the submit button. So, we have added the onPress property to submit button which calls the submitButtonHandler().

Here is the complete code for Register.js file:

import React, {Component} from 'react';
import { StyleSheet, View, Text, TextInput, Image, ScrollView, TouchableHighlight } from 'react-native';
import ImagePicker from "react-native-image-picker";
import Amplify, {API} from "aws-amplify";
Amplify.configure({
    API: {
        endpoints: [
            {
                name: "<api-name>",
                Endpoint: "<your endpoint url>"
            }
        ]
    }
});

class Registration extends Component {
  
    constructor(props){
        super(props);
        this.state =  {
            username : '',
            capturedImage : ''
        };
        // this.submitButtonHandler = this.submitButtonHandler.bind(this);
    }

    captureImageButtonHandler = () => {
        ImagePicker.showImagePicker({title: "Pick an Image", maxWidth: 800, maxHeight: 600}, (response) => {
            console.log('Response = ', response);
            // alert(response)
            if (response.didCancel) {
                console.log('User cancelled image picker');
            } else if (response.error) {
                console.log('ImagePicker Error: ', response.error);
            } else if (response.customButton) {
                console.log('User tapped custom button: ', response.customButton);
            } else {
                // You can also display the image using data:
                const source = { uri: 'data:image/jpeg;base64,' + response.data };
            
                this.setState({capturedImage: response.uri, base64String: source.uri });
            }
        });
    }

    submitButtonHandler = () => {
        if (this.state.username == '' || this.state.username == undefined || this.state.username == null) {
            alert("Please Enter the Username");
        } else if(this.state.capturedImage == '' || this.state.capturedImage == undefined || this.state.capturedImage == null) {
            alert("Please Capture the Image");
        } else {
            const apiName = "<api-name>";
            const path = "<your path>";
            const init = {
                headers : {
                    'Accept': 'application/json',
                    "X-Amz-Target": "RekognitionService.IndexFaces",
                    "Content-Type": "application/x-amz-json-1.1"
                },
                body : JSON.stringify({ 
                    Image: this.state.base64String,
                    name: this.state.username
                })
            }
            
            API.post(apiName, path, init).then(response => {
                alert(response);
            });
        }
    }

    render() {
        if(this.state.image!=="") {
            // alert(this.state.image)
        }
        return (
            <View style={styles.MainContainer}>
                <ScrollView>
                    <Text style= {{ fontSize: 20, color: "#000", textAlign: 'center', marginBottom: 15, marginTop: 10 }}>Register Face</Text>
                
                    <TextInput
                        placeholder="Enter Username"
                        onChangeText={UserName => this.setState({username: UserName})}
                        underlineColorAndroid='transparent'
                        style={styles.TextInputStyleClass}
                    />


                    {this.state.capturedImage !== "" && <View style={styles.imageholder} >
                        <Image source={{uri : this.state.capturedImage}} style={styles.previewImage} />
                    </View>}

                    <TouchableHighlight style={[styles.buttonContainer, styles.captureButton]} onPress={this.captureImageButtonHandler}>
                        <Text style={styles.buttonText}>Capture Image</Text>
                    </TouchableHighlight>

                    <TouchableHighlight style={[styles.buttonContainer, styles.submitButton]} onPress={this.submitButtonHandler}>
                        <Text style={styles.buttonText}>Submit</Text>
                    </TouchableHighlight>
                </ScrollView>
            </View>
        );
    }
}

const styles = StyleSheet.create({
    TextInputStyleClass: {
      textAlign: 'center',
      marginBottom: 7,
      height: 40,
      borderWidth: 1,
      margin: 10,
      borderColor: '#D0D0D0',
      borderRadius: 5 ,
    },
    inputContainer: {
      borderBottomColor: '#F5FCFF',
      backgroundColor: '#FFFFFF',
      borderRadius:30,
      borderBottomWidth: 1,
      width:300,
      height:45,
      marginBottom:20,
      flexDirection: 'row',
      alignItems:'center'
    },
    buttonContainer: {
      height:45,
      flexDirection: 'row',
      alignItems: 'center',
      justifyContent: 'center',
      marginBottom:20,
      width:"80%",
      borderRadius:30,
      marginTop: 20,
      marginLeft: 5,
    },
    captureButton: {
      backgroundColor: "#337ab7",
      width: 350,
    },
    buttonText: {
      color: 'white',
      fontWeight: 'bold',
    },
    horizontal: {
      flexDirection: 'row',
      justifyContent: 'space-around',
      padding: 10
    },
    submitButton: {
      backgroundColor: "#C0C0C0",
      width: 350,
      marginTop: 5,
    },
    imageholder: {
      borderWidth: 1,
      borderColor: "grey",
      backgroundColor: "#eee",
      width: "50%",
      height: 150,
      marginTop: 10,
      marginLeft: 90,
      flexDirection: 'row',
      alignItems:'center'
    },
    previewImage: {
      width: "100%",
      height: "100%",
    }
});

export default Registration;

Now reload your application and register the image(face).

After registering successfully you will receive an alert message as below:

Now go to your S3 bucket and check if the image is stored as below:

And also check in your collection using below command from your command line:

aws rekognition list-faces --collection-id "<your collection id>"

You will get a JSON data with list of faces that are registered as output. So, the registration process is working successfully. Now we need to add the verification/searchface process to our application. I created another lambda function (searchFace) for face verification. Here is the code for face verification lambda function.

const AWS = require('aws-sdk')
var rekognition = new AWS.Rekognition()
var s3Bucket = new AWS.S3( { params: {Bucket: "<bucket-name>"} } );
var fs = require('fs');

exports.handler = (event, context, callback) => {
    let parsedData = JSON.parse(event)
    let encodedImage = parsedData.Image;
    var filePath = parsedData.name + ".jpg";
    console.log(filePath)
    let decodedImage = new Buffer(encodedImage.replace(/^data:image\/\w+;base64,/, ""),'base64')
    var data = {
        Key: filePath, 
        Body: decodedImage,
        ContentEncoding: 'base64',
        ContentType: 'image/jpeg'
    };
    s3Bucket.putObject(data, function(err, data){
        if (err) { 
            console.log('Error uploading data: ', data);
            callback(err);
        } else {
            console.log('succesfully uploaded the image!');
            // callback(null, data);
        }
    });
    var params2 ={
        CollectionId: "<collectio-id>", 
        FaceMatchThreshold: 85, 
        Image: {
            S3Object: {
                Bucket: "<bucket-name>", 
                Name: filePath
            }
        }, 
        MaxFaces: 5
    }
    setTimeout(function () {
        rekognition.searchFacesByImage(params2, function(err, data) {
            if (err){
                console.log(err, err.stack); // an error occurred
                callback(err)
            }
            else{
                console.log(data);           // successful response
                callback(null,data);
            }
        });
    }, 2000);
};

In the above lambda function, we are using SearchFacesByImage. It searches the image from the collection. The response will be a JSON object. Now create a new file called Verification.js in your project root directory and copy the below code in it:

import React, {Component} from 'react';
import { StyleSheet, View, Text, TextInput, Image, ScrollView, TouchableHighlight } from 'react-native';
import ImagePicker from "react-native-image-picker";
import Amplify, {API} from "aws-amplify";
Amplify.configure({
   API: {
       endpoints: [
           {
               name: "<API-name>",
               endpoint: "<your endpoint url>"
           }
       ]
   }
});

class Verification extends Component {
    constructor(props){
       super(props);
       this.state =  {
           username: ''
           capturedImage : ''
       };
   }

   captureImageButtonHandler = () => {
       ImagePicker.showImagePicker({title: "Pick an Image", maxWidth: 800, maxHeight: 600}, (response) => {
           console.log('Response = ', response);
           // alert(response)
           if (response.didCancel) {
               console.log('User cancelled image picker');
           } else if (response.error) {
               console.log('ImagePicker Error: ', response.error);
           } else if (response.customButton) {
               console.log('User tapped custom button: ', response.customButton);
           } else {
               // You can also display the image using data:
               const source = { uri: 'data:image/jpeg;base64,' + response.data };
          
               this.setState({capturedImage: response.uri, base64String: source.uri });
           }
       });
   }

   verification = () => {
       if(this.state.capturedImage == '' || this.state.capturedImage == undefined || this.state.capturedImage == null) {
           alert("Please Capture the Image");
       } else {
           const apiName = "<api-name>";
           const path = "<your path>";
          
           const init = {
               headers : {
                   'Accept': 'application/json',
                   "X-Amz-Target": "RekognitionService.SearchFacesByImage",
                   "Content-Type": "application/x-amz-json-1.1"
               },
               body : JSON.stringify({
                   Image: this.state.base64String,
                   name: this.state.username
               })
           }
          
           API.post(apiName, path, init).then(response => {
               if(JSON.stringify(response.FaceMatches.length) > 0) {
                   alert(response.FaceMatches[0].Face.ExternalImageId)
               } else {
                   alert("No matches found.")
               }
           });
       }
   }

  
  
    render() {
       if(this.state.image!=="") {
           // alert(this.state.image)
       }
       return (
           <View style={styles.MainContainer}>
               <ScrollView>
                   <Text style= {{ fontSize: 20, color: "#000", textAlign: 'center', marginBottom: 15, marginTop: 10 }}>Verify Face</Text>
              
                   {this.state.capturedImage !== "" && <View style={styles.imageholder} >
                       <Image source={{uri : this.state.capturedImage}} style={styles.previewImage} />
                   </View>}

                   <TouchableHighlight style={[styles.buttonContainer, styles.captureButton]} onPress={this.captureImageButtonHandler}>
                       <Text style={styles.buttonText}>Capture Image</Text>
                   </TouchableHighlight>

                   <TouchableHighlight style={[styles.buttonContainer, styles.verifyButton]} onPress={this.verification}>
                       <Text style={styles.buttonText}>Verify</Text>
                   </TouchableHighlight>
               </ScrollView>
           </View>
       );
   }
}

const styles = StyleSheet.create({
   container: {
     flex: 1,
     backgroundColor: 'white',
     alignItems: 'center',
     justifyContent: 'center',
   },
   buttonContainer: {
     height:45,
     flexDirection: 'row',
     alignItems: 'center',
     justifyContent: 'center',
     marginBottom:20,
     width:"80%",
     borderRadius:30,
     marginTop: 20,
     marginLeft: 5,
   },
   captureButton: {
     backgroundColor: "#337ab7",
     width: 350,
   },
   buttonText: {
     color: 'white',
     fontWeight: 'bold',
   },
   verifyButton: {
     backgroundColor: "#C0C0C0",
     width: 350,
     marginTop: 5,
   },
   imageholder: {
     borderWidth: 1,
     borderColor: "grey",
     backgroundColor: "#eee",
     width: "50%",
     height: 150,
     marginTop: 10,
     marginLeft: 90,
     flexDirection: 'row',
     alignItems:'center'
   },
   previewImage: {
     width: "100%",
     height: "100%",
   }
});

export default Verification;

In the above code there are two buttons one (Capture image) for capturing the face that needs to be verified and another (Verify) for verifying the captured face if it is registered or not. When a user clicks on Verify button verification() method will be called in which we make a POST request (invoking searchFace lambda function via API gateway).

Now we are having two screens one for Registration and another for Verification.Let’s add navigation between two screens using react-navigation. The first step is to install react-navigation in your project:

npm install --save react-navigation

The second step is to install react-native-gesture-handler:

yarn add react-native-gesture-handler
# or with npm
# npm install --save react-native-gesture-handler

Now we need to link our react-native with react-native-gesture-handler:

react-native link react-native-gesture-handler

After that go back to your App.js file and replace it with the below code:

import React, {Component} from 'react';
import {View, Text, TouchableHighlight, StyleSheet} from 'react-native';
import Registration from './Registration';
import {createStackNavigator, createAppContainer} from 'react-navigation';
import Verification from './Verification';

class HomeScreen extends React.Component {
   render() {
       return (
           <View style={{ flex: 1, alignItems: "center" }}>
               <Text style= {{ fontSize: 30, color: "#000", marginBottom: 50, marginTop: 100 }}>Register Face ID</Text>
               <TouchableHighlight style={[styles.buttonContainer, styles.button]} onPress={() => this.props.navigation.navigate('Registration')}>
                   <Text style={styles.buttonText}>Registration</Text>
               </TouchableHighlight>
               <TouchableHighlight style={[styles.buttonContainer, styles.button]} onPress={() => this.props.navigation.navigate('Verification')}>
                   <Text style={styles.buttonText}>Verification</Text>
               </TouchableHighlight>
           </View>
       );
   }
}

const MainNavigator = createStackNavigator(
   {
       Home: {screen: HomeScreen},
       Registration: {screen: Registration},
       Verification: {screen: Verification}
   },
   {
       initialRouteName: 'Home',
   }
);

const AppContainer = createAppContainer(MainNavigator);

export default class App extends Component {
   render() {
       return <AppContainer />;
   }
}

const styles = StyleSheet.create({
   buttonContainer: {
       height:45,
       flexDirection: 'row',
       alignItems: 'center',
       justifyContent: 'center',
       marginBottom:20,
       width:"80%",
       borderRadius:30,
       marginTop: 20,
       marginLeft: 5,
   },
   button: {
       backgroundColor: "#337ab7",
       width: 350,
       marginTop: 5,
   },
   buttonText: {
       color: 'white',
       fontWeight: 'bold',
   },
})

Now reload your app and you could see your home screen as below:

When user clicks on Registration button it navigates to Registration screen as below:

When user clicks on Verification button it navigates to Verification screen as below:

Now let’s check the verification process.

Step 1: Navigate to Verification screen.

Step 2: Capture the registered image(face).

Step 3: Click on the verify button.

If everything is fine then you will receive an alert message with the face name as below:

If there are no face matches with the captured face then the user receives an alert message as “No matches found”.

Thanks for the read, I hope it was useful.

This story is authored by Venu Vaka. Venu is a software engineer and machine learning enthusiast.

Creating an Automated Data Engineering Pipeline for Batch Data in Machine Learning

A common use case in Machine Learning life cycle is to have access to the latest training data so as to prevent model deterioration. A lot of times data scientists find it cumbersome to manually export data from data sources such as relational databases or NoSQL data stores or even distributed data. This necessitates automating the data engineering pipeline in Machine Learning. In this post, we will describe how to set up this pipeline for batch data. This workflow is orchestrated via Airflow and can be set up to run at regular intervals: such as hourly, daily, weekly, etc depending on the specific business requirements.

Quick note – In case you are interested in building a real time data engineering pipeline for ML, please look at this post.

In this use case, we are going to export MongoDB data into Google BigQuery via Cloud Storage. The updated data in BigQuery is then made available in Jupyter Notebook as a Pandas Dataframe for downstream model building and analytics. As the pipeline automates the data ingestion and preprocessing, the data scientists always have access to the latest batch data in their Jupyter Notebooks hosted on Google AI Platform. 

We have a MongoDB service running in an instance and we have Airflow and mongoexport running on docker on another instance. Mongoexport is a utility that produces a JSON or CSV export of data stored in MongoDB. Now the data in MongoDB shall be extracted and transformed using mongoexport and loaded into CloudStorage. Airflow is used to schedule and orchestrate these exports. Once the data is available in CloudStorage it could be queried in BigQuery. We then get this data from BigQuery to Jupyter Notebook. Following is a step by step sequence of steps to set up this data pipeline.

You can create an instance in GCP by going to Compute Engine. Click on create instance.

Install.sh:

sudo apt-get update
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh
sudo usermod -aG docker $USER
sudo apt-get install -y python-pip
export AIRFLOW_HOME=~/airflow
sudo pip install apache-airflow
sudo pip install apache-airflow[postgres,s3]
airflow initdb
airflow webserver -p 8080 -D
airflow scheduler -D
sudo docker pull mongo
sudo docker run --name mongo_client -d mongo

Please run the install.sh file using ./install.sh command (please make sure file is executable), which would install Docker, Airflow, pulls Mongo image and runs the mongo image in a container named mongo_client.

After installation, for Airflow webUIhttp://<public-ip-instance>:8080 (You may need to open port 8080 in the network just for your public IP)


Please make sure the Google service account in the running instance must have permissions for accessing Big Query and Cloud Storage. After installation, add the Airflow job Python file (mongo-export.py) inside the airflow/dags folder.

Before running the Python file, please make sure that you create Dataset and create the table in BigQuery. Also change the appropriate values for the MongoDB source database, MongoDB source table, Cloud Storage destination bucket and BigQuery destination dataset in the Airflow job Python file (mongo-export.py). Big Query destination table name is the same as the source table in Mongo DB. 

Mongo-export.py:

import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta
import json
from pandas.io.json import json_normalize

# Following are default arguments which could be overridden
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(0),
    'email': ['airflow@gmail.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

bucket_name = '<Your_Bucket>'
db_name = '<Database_Name>'
dataset = '<Dataset_Name>'
table_name = '<Table_Name>'


time_stamp = datetime.now()
cur_date = time_stamp.strftime("%Y-%m-%d")

# It will flatten the nested json
def flatten_json(y):
    out = {}
    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

def convert_string(y):
    string_type = {}

    def convert(x, name=''):
        if type(x) is dict:
            for a in x:
                convert(str(x[a]), name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            string_type[name[:-1]] = x

    convert(y)
    return string_type


def json_flat():
    lines = [line.rstrip('\n') for line in open('/home/dev/'+ table_name + '-unformat.json')]
    flat_list = []
    for line in lines:
        line = line.replace("\"$", "\"")
        line = json.loads(line)
        try:
            flat_list.append(json.dumps(convert_string(flatten_json(line))))
        except Exception as e:
            print(e)
    flatted_json = '\n'.join(i for i in flat_list)

    with open('/home/dev/' + table_name + '.json', 'a') as file:
        file.write(flatted_json)
    return flatted_json 

dag = DAG('mongoexport-daily-gcs-bq', default_args=default_args, params = {'cur_date': cur_date, 'db_name': db_name, 'table_name': table_name, 'dataset': dataset, 'bucket_name': bucket_name})
#exports provide a table data into docker container 
t1 = BashOperator(
    task_id='mongoexport_to_container',
    bash_command='sudo docker exec -i mongo_client sh -c "mongoexport --host=<instance_public_ip> --db {{params.db_name}} --collection {{params.table_name}} --out {{params.table_name}}-unformat.json"',
    dag=dag)

# copies exported file into instance

t2 = BashOperator(
    task_id='cp_from_container_instance',
    bash_command='sudo docker cp mongo_client:/{{params.table_name}}-unformat.json /home/dev/',
    dag=dag)

t3 = PythonOperator(
    task_id='flattening_json',
    python_callable=json_flat,
    dag=dag)
# copies the flatten data from cloud storage
t4 = BashOperator(
    task_id='cp_from_instance_gcs',
    bash_command='gsutil cp /home/dev/{{params.table_name}}.json gs://{{params.bucket_name}}/raw/{{params.table_name}}/date={{params.cur_date}}/',
    dag=dag)
# 
t5 = BashOperator(
    task_id='cp_from_instance_gcs_daily_data',
    bash_command='gsutil cp /home/dev/{{params.table_name}}.json gs://{{params.bucket_name}}/curated/{{params.table_name}}/',
    dag=dag)

# removes the existing bigquery table
t6 = BashOperator(
    task_id='remove_bq_table',
    bash_command='bq rm -f {{params.dataset}}.{{params.table_name}}',
    dag=dag)
# creates a table in bigquery
t7 = BashOperator(
    task_id='create_bq_table',
    bash_command='bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON {{params.dataset}}.{{params.table_name}} gs://{{params.bucket_name}}/curated/{{params.table_name}}/{{params.table_name}}.json',
    dag=dag)
# removes data from container
t8 = BashOperator(
    task_id='remove_file_from_container',
    bash_command='sudo docker exec -i mongo_client sh -c "rm -rf {{params.table_name}}*.json"',
    dag=dag)
# removes data from instance
t9 = BashOperator(
    task_id='remove_file_from_instance',
    bash_command='rm -rf /home/dev/{{params.table_name}}*.json',
    dag=dag)

t1 >> t2
t2 >> t3
t3 >> [t4, t5]
[t4, t5] >> t6
t6 >> t7
t7 >> [t8, t9]

Then run the python file using python <file-path>.py  

(example: python airflow/dags/mongo-export.py).

After running the python file, the dag name shows in Airflow webUI. And you could trigger the dag manually. Please make sure toggle button is in ON status

Once the job completes, the data is stored in the bucket and also available in the destination table in BigQuery. You could see the table is created in BigQuery. Click on querytable to perform SQL operations and you could see your results in the preview tab at the bottom.

Now, you could access the data in Jupyter Notebook from BigQuery. Search for notebook in GCP console. 

Run the below commands in Jupyter Notebook.

from google.cloud import bigquery
client = bigquery.Client()
sql = """
SELECT * FROM 
`<project-name>.<dataset-name>.<table-name>`
"""
df = client.query(sql).to_dataframe()
df.head(10)

This loads the BigQuery data into Pandas dataframe and can be used for model creation as required. Later when the data pipeline is run as per schedule, the refreshed data would automatically be available in this Jupyter notebook via this SQL query.

Hope this helps you to automate your batch Data Engineering pipeline for Machine Learning. 

This story is co-authored by Santosh and Subbareddy. Santosh is an AWS Cloud Engineer and Subbareddy is a Big Data Engineer.