Image Text Detection with Bounding Boxes using OpenCV in React Native Mobile App

In our earlier blog post, we had built a Text Detection App with React Native using AWS Rekognition. The Text Detection App basically detects the texts and their dimensions in the captured image. This blog is an extension to it, where we shall learn how to draw Bounding Boxes using the dimensions of the detected text in the image. Assuming you had followed our earlier blog and created the Text Detection App we will proceed further.

The following diagram depicts the architecture we will be building. 

The React app sends the image to be processed via an API call, detect_text lambda function stores it in S3 and calls Amazon Rekognition with its URL to get the detected texts along with their dimensions. With this data it invokes the draw_bounding_box lambda function, it fetches the image from S3, draws the bounding boxes and stores it as a new image. With new URL it responds back to detect_text lambda which in turn responds back to the app via API gateway with the image URL having bounding boxes.

In our previous blog we already have finished detecting the text part, let us look at creating the rest of the setup.

We will use another AWS lambda function to draw bounding boxes and that function would need OpenCV.

OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine learning software library which was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products.

Preparing the package for draw_bounding_box Lambda:

We need OpenCV, Numpy libraries for image manipulation but lambda doesn’t support these libraries by default. We have to prepare them and upload them. So, we will be preparing the lambda code as a package locally and then upload.

To prepare these libraries, follow this link. After finishing the process you will get a zip file. Unzip the file and copy the below lambda code in .py file.

Note: The name of this .py file should be the same as your lambda index handler.
default name: lambda_handler

lambda_handler.py:

import cv2
import boto3
import numpy as np
import base64
import json

def lambda_handler(event, context):
    bucketName='<-Your-Bucket-Name->'
    s3_client = boto3.client('s3')

    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucketName)

    # reading image name from event object
    obj = bucket.Object(key=event['image'])
    # Image we read from S3 bucket
    response = obj.get()

    imgContent = response["Body"].read()
    np_array = np.fromstring(imgContent, np.uint8)
    ''' we can read an image using cv2.imread() or np.fromstring(imgContent, np.uint8) followed by image_np = cv2.imdecode(np_array, cv2.IMREAD_COLOR).    '''
    image_np = cv2.imdecode(np_array, cv2.IMREAD_COLOR)

    '''
    imdecode reads an image from the specified buffer in the memory. 
    If the buffer is too short or contains invalid data, the empty matrix/image is returned.
    In OpenCV you can easily read in images with different file formats (JPG, PNG, TIFF etc.) using imread
    '''
    height, width, channels = image_np.shape
    # reading dimensions from the event object
    dimensions=json.loads(event['boundingBoxDimensions'])

    for dimension in dimensions:
        leftbox = int(width * dimension['Left'])
        topbox = int(height * dimension['Top'])
        widthbox = int(width * dimension['Width'])
        heightbox = int(height * dimension['Height'])
        # Using cv2.rectangle, we will draw a rectangular box with respect to dimensions
        k=cv2.rectangle(image_np, (leftbox, topbox), (leftbox+widthbox, topbox+heightbox) ,(0, 255, 0), 2)

    # used to write the image changes in a local image
    # For that create any folder(here tmp) and place any sample image(here sample.jpeg) in it. If not, we would encounter the following error.
‘Utf8’ codec can’t decode byte 0xaa in position 1: invalid start byte: UnicodeDecodeError.
 
    cv2.imwrite("/tmp/sample.jpeg", k)

    newImage="<-New-Image-Name->"
    # we put the image in S3. And return the image name as we store the modified image in S3
    s3_client.put_object(Bucket=bucketName, Key=newImage, Body=open("/tmp/sample.jpeg", "rb").read())

    return {
        'statusCode': 200,
        'imageName':newImage,
        'body': 'Hello from Lambda!'
    }

The package folder structure would look like below.

As these files exceed the Lambda upload limit, we will be uploading them to S3 and then add it from there.

Zip this lambda-package and upload it to S3. You can paste its S3 URL to your function code and change the lambda runtime environment to use Python 2.7 (OpenCV dependency).

Invoking draw_bounding_box lambda

The detect_text lambda invokes draw_bounding_box lambda in RequestResponse mode, which means detect_text lambda waits for the response of draw_bounding_box lambda.

The draw_bounding_box lambda function reads the image name and box dimensions from the event object. Below is the code for detect_text lambda which invokes the draw_bounding_box lambda function.

detect_text.js

const AWS = require('aws-sdk');
// added package
const S3 = new AWS.S3({signatureVersion: 'v4'});

var rekognition = new AWS.Rekognition();
var s3Bucket = new AWS.S3( { params: {Bucket: "<-Your-Bucket-Name->"} } );
var fs = require('fs');
// To invoke lambda function
var lambda = new AWS.Lambda();

exports.handler = (event, context, callback) => {
    let parsedData = JSON.parse(event);
    let encodedImage = parsedData.Image;
    var filePath = parsedData.name;

    let buf = new Buffer(encodedImage.replace(/^data:image\/\w+;base64,/, ""),'base64');
    var data = {
        Key: filePath, 
        Body: buf,
        ContentEncoding: 'base64',
        ContentType: 'image/jpeg'
    };
    s3Bucket.putObject(data, function(err, data){
        if (err) { 
            console.log('Error uploading data: ', data);
            callback(err, null);
        } else {
            var params = {
                Image: {
                    S3Object: {
                        Bucket: "your-s3-bucket-name", 
                        Name: filePath
                    }
                }
            };

            rekognition.detectText(params, function(err, data) {
                if (err){
                    console.log(err, err.stack);
                    callback(err);
                }
                else{
                    console.log("data: ",data);
                    var detectedTextFromImage=[];
                    var geometry=[];
                    for (item in data.TextDetections){
                      if(data.TextDetections[item].Type === "LINE"){
                        geometry.push(data.TextDetections[item].Geometry.BoundingBox);
                        detectedTextFromImage.push(data.TextDetections[item].DetectedText);
                      }
                    }
                    var dimensions=JSON.stringify(geometry);
                    var payloadData={
                        "boundingBoxDimensions":dimensions,
                        "image": filePath
                    };

                    var params = {
                        FunctionName: 'draw_bounding_box',
                        InvocationType: "RequestResponse",
                        Payload: JSON.stringify(payloadData)
                    };
                    
                    lambda.invoke(params, function(err, data) {
                        if (err){
                            console.log("error occured");
                            console.log(err);
                        }
                        else{
                            var jsondata=JSON.parse(data.Payload);
                            var params = {
                                Bucket: "your-s3-bucket-name", 
                                Key: jsondata.imageName,
                            };
                            s3Bucket.getSignedUrl('getObject', params, function (err, url) {
                                var responseData={
                                        "DetectedText":detectedTextFromImage,
                                    "url":url
                                }
                                callback(null, responseData);
                            });                            
                        }
                    });
                    console.log("waiting for response");
                }
            });
        }
    });
};

Everything is similar except the rekognition.detectText() function. Upon success, we are storing the detected text in a list and dimensions in another list. Next, we need to pass the dimensions list and image name as arguments to the draw_bounding_box lambda function.

var payloadData={
    "boundingBoxDimensions":dimensions,
    "image": filePath
};

var params = {
    FunctionName: 'draw_bounding_box',
    InvocationType: "RequestResponse",
    Payload: JSON.stringify(payloadData)
};
lambda.invoke(params, function(err, data) {
    if (err){
        console.log("error occured");
        console.log(err);
    }
    else{
        var jsondata=JSON.parse(data.Payload);
        var params = {
            Bucket: "your-s3-bucket-name", 
            Key: jsondata.imageName,
        };
        s3Bucket.getSignedUrl('getObject', params, function (err, url) {
            var responseData={
                "DetectedText":detectedTextFromImage,
                "url":url
            }
            callback(null, responseData);
        });                            
    }
});

Lambda.invoke()  expects two arguments where the first argument needs to be an object which contains the name of the lambda function, invocation type, payload data. And the second argument is to handle success or failure response. When the detect_text lambda function invokes the draw_bounding_box function, it will process the image and give the response back to the detect_text lambda function. Upon success, we get the JSON object which contains the modified image name. 

Next, we use s3Bucket.getSignedUrl() to get the image URL which we will send to our React Native App with detected text also as a response.

Replace the existing App.js file, in react-native with the code below.
App.js

import React, {Component} from 'react';
import {
    StyleSheet,
    View,
    Text,
    TextInput,
    Image,
    ScrollView,
    TouchableHighlight,
    ActivityIndicator
} from 'react-native';
import ImagePicker from "react-native-image-picker";
import Amplify, {API} from "aws-amplify";
Amplify.configure({
    API: {
        endpoints: [
            {
                name: "<-Your-API-name->",
                endpoint: "<-Your-end-point-url->"
            }
        ]
    }
});

class Registration extends Component {
  
    constructor(props){
        super(props);
        this.state =  {
            isLoading : false,
            showInputField : false,
            imageName : '',
            capturedImage : '',
            detectedText: []
        };
    }

    captureImageButtonHandler = () => {
        ImagePicker.showImagePicker({title: "Pick an Image", maxWidth: 800, maxHeight: 600}, (response) => {
            console.log('Response - ', response);
            if (response.didCancel) {
                console.log('User cancelled image picker');
            } else if (response.error) {
                console.log('ImagePicker Error: ', response.error);
            } else if (response.customButton) {
                console.log('User tapped custom button: ', response.customButton);
            } else {
                const source = { uri: 'data:image/jpeg;base64,' + response.data };
                this.setState({
                    imageName: "IMG-" + Date.now(),
                    showInputField: true,
                    capturedImage: response.uri,
                    base64String: source.uri
                })
            }
        });
    }

    submitButtonHandler = () => {
        this.setState({
            isLoading: true
        })
        if (this.state.capturedImage == '' || this.state.capturedImage == undefined || this.state.capturedImage == null) {
            alert("Please Capture the Image");
        } else {
            console.log("submiting")
            const apiName = "<-Your-API-name->";
            const path = "/API-path";
            const init = {
                headers: {
                    'Accept': 'application/json',
                    "Content-Type": "application/x-amz-json-1.1"
                },
                body: JSON.stringify({
                    Image: this.state.base64String,
                    name: this.state.imageName
                })
            }

            API.post(apiName, path, init).then(response => {
                this.setState({
                    capturedImage: response.url,
                    detectedText: response.DetectedText,
                    isLoading:false
                })
            });
        }
    }
  
    render() {
        let inputField;
        let submitButtonField;
        if (this.state.showInputField) {
            inputField=
                    <View style={styles.buttonsstyle}>
                        <TextInput
                            placeholder="Img"
                            value={this.state.imageName}
                            onChangeText={imageName => this.setState({imageName: imageName})}
                            style={styles.TextInputStyleClass}
                        />
                    </View>;
            submitButtonField=<TouchableHighlight style={[styles.buttonContainer, styles.submitButton]} onPress={this.submitButtonHandler}>
                            <Text style={styles.buttonText}>Submit</Text>
                        </TouchableHighlight>
            
        }
        
        return (
            <View style={styles.screen}>
                <ScrollView>
                    <Text style= {{ fontSize: 20, color: "#000", textAlign: 'center', marginBottom: 15, marginTop: 10 }}>Text Extracter</Text>

                    {this.state.capturedImage !== "" && <View style={styles.imageholder} >
                        <Image source={{uri : this.state.capturedImage}} style={styles.previewImage} />
                    </View>}
                    {inputField}
                    {this.state.isLoading && (
                        <ActivityIndicator
                            style={styles.Loader}
                            color="#C00"
                            size="large"
                        />
                    )}
                    <View>
                        {
                       this.state.detectedText.map((data, index) => {
                       return(
                           <Text key={index} style={styles.DetextTextView}>{data}</Text>
                       )})
                       }
                    </View>
                    <View style={styles.buttonsstyle}>
                        <TouchableHighlight style={[styles.buttonContainer, styles.captureButton]} onPress={this.captureImageButtonHandler}>
                            <Text style={styles.buttonText}>Capture Image</Text>
                        </TouchableHighlight>
                        {submitButtonField}
                    </View>
                </ScrollView>
            </View>
        );
    }
}

const styles = StyleSheet.create({
    Loader:{
        flex: 1,
        justifyContent: 'center',
        alignItems: 'center',
        height: "100%"
    },
    screen:{
        flex:1,
        justifyContent: 'center',
    },
    buttonsstyle:{
        flex:1,
        alignItems:"center"
    },
    DetextTextView:{
      textAlign: 'center',
    },
    TextInputStyleClass: {
      textAlign: 'center',
      marginBottom: 7,
      height: "70%",
      margin: 10,
      width:"80%"
    },
    inputContainer: {
      borderBottomColor: '#F5FCFF',
      backgroundColor: '#FFFFFF',
      borderRadius:30,
      borderBottomWidth: 1,
      width:"90%",
      height:45,
      marginBottom:20,
      flexDirection: 'row',
      alignItems:'center'
    },
    buttonContainer: {
      height:45,
      flexDirection: 'row',
      alignItems: 'center',
      justifyContent: 'center',
      borderRadius:30,
      margin: 5,
    },
    captureButton: {
      backgroundColor: "#337ab7",
      width: "90%",
    },
    buttonText: {
      color: 'white',
      fontWeight: 'bold',
    },
    horizontal: {
      flexDirection: 'row',
      justifyContent: 'space-around',
      padding: 10
    },
    submitButton: {
      backgroundColor: "#C0C0C0",
      width: "90%",
      marginTop: 5,
    },
    imageholder: {
      borderWidth: 1,
      borderColor: "grey",
      backgroundColor: "#eee",
      width: "50%",
      height: 150,
      marginTop: 10,
      marginLeft: 90,
      flexDirection: 'row',
      alignItems:'center'
    },
    previewImage: {
      width: "100%",
      height: "100%",
    }
});

export default Registration;

Below are the screenshots of the React Native App running on an Android device.
We used the below image to extract text and add bounding boxes.

The image name is generated dynamically with epoch time, which is editable.

I hope it was helpful, thanks for the read!

This story is authored by Dheeraj Kumar and Santosh Kumar. Dheeraj is a software engineer specializing in React Native and React based frontend development. Santosh specializes on Cloud Services based development.

Federated Querying across Relational, Non-relational, Object, and Custom Data Sources using Amazon Athena

Querying Data from DynamoDB in Amazon Athena

Amazon Athena now enables users to run SQL queries across data stored in relational, non-relational, object, and custom data sources. With federated querying, customers can submit a single SQL query that scans data from multiple sources running on-premises or hosted in the cloud.

Athena executes federated queries using Athena Data Source Connectors that run on AWS Lambda. Athena federated query is available in Preview in the us-east-1 (N. Virginia) region.

Preparing to create federated queries is a two-part process:

  1. Deploying a Lambda function data source connector.
  2. Connecting the Lambda function to a data source. 

I assume that you have at least one DynamoDB table in us-east-1 region.

Deploy a Data Source Connector

  • Open the Amazon Athena console and choose the Connect data source. This feature is available in the region us-east-1 only.
  • On the Connect data source console, choose Query a data source feature. And choose Amazon DynamoDB as a data source.
  • Choose Next
  • For the Lambda function, choose to Configure new function. It opens in the Lambda console in a new tab with information about the connector.
  • Under ApplicationSettings, provide the required information.
    1. AthenaCatalogName – A name for the Lambda function.
    2. SpillBucket – An Amazon S3 bucket in your account to store data that exceeds Lambda function response size limits.
    3. SpillPrefix – Data that exceeds Lambda function response size limits stores under the Spillbucket/Spillprefix.
  • Choose I acknowledge that this app creates custom IAM roles and choose Deploy.

Connect to a data source using a connector that deployed in the earlier step

  • Open the Amazon Athena console and choose the Connect data source. This feature is available in the region us-east-1 only.
  • On the Connect data source console, choose Query a data source feature. And choose Amazon DynamoDB as a data source and choose Next.
  • Configure the Lambda function, choose the name of the lambda that you created in the earlier step.
  • Configure Catalog name, enter a unique name to use for the data source in your SQL queries, such as dynamo_athena.
  • Choose Connect. Now the data source is available under the Data Sources section in Amazon Athena.

Querying Data using Federated Queries

To use this feature in preview, you must create an Athena workgroup named AmazonAthenaPreviewFunctionality and join that workgroup.

Create an Athena workgroup

  • Open the Amazon Athena console and choose Workgroup, and choose Create workgroup.
  • After creating a Workgroup, under Workgroup section select the created workgroup and choose Switch workgroup.
  • Select the Data source that was created in the earlier step in Athena. After choosing the data source, the DynamoDb tables are available in Athena in the default database.

Querying Data in Athena using SQL Queries

The following query is used to retrieve data from DynamoDB in Athena.

SELECT * FROM "data_source_connector"."database_name"."table_name";

Creating Athena table using CTAS with results of querying DynamoDB

The CTAS query looks like the following. Using the CTAS query, the format of data can be changed into the required format be it parquet, JSON, and CSV, etc.

CREATE TABLE database.table_name
WITH (
      external_location = 's3://bucket-name/data/',
      format = 'parquet')
AS 
SELECT * FROM "data_source_connector"."database_name"."table_name";

I hope this was helpful and look forward to your comments.

This story is authored by PV Subbareddy. Subbareddy is a Big Data Engineer specializing on AWS Big Data Services and Apache Spark Ecosystem.

Efficiently Tagging AWS Resources Using CLI to Better Manage Resources and Billing Costs

It is common when organizations have large workloads based on on a multitude of AWS services, they may lose track of how resources are being used. In a nutshell, identifying resources can take rigorous effort. On AWS, utilization and cost go hand in hand and tagging helps ensure that the resources are managed efficiently. In fact, one could also build insightful reports/dashboards with the tags in place.

Tagging Strategy:

For tags to be effective at scale they need to be strategically managed. Many organizations group tags into different categories like technical, business, security and automation, etc. A typical set of tags could be:

  1. Name
  2. Owner
  3. Application/Project/Product
  4. Environment
  5. Client/Customer

For more on creative tagging strategies, please read this.

Prerequisites: AWS CLI configured.

Getting all untagged resources using CLI:

As of this writing, there is no CLI command to list all untagged resources. One could follow the below steps to get the list.

Step1: List all the resources in AWS and write them to a text file

aws lambda list-functions --profile PROFILE_NAME &>> resourcesList.txt

Note: The above command is for listing details of lambda resources. The command and its output might vary with other resources. Read more here.

&>> appends the output of the command to resourcesList.txt file in the current working directory.

The output of the above command is a JSON object that looks like this:

{
    "Functions": [
        {
            "FunctionName": "Chat-Conversation-POST",
            "FunctionArn": "arn:aws:lambda:us-west-2:89XXXXXXXX14:function:Chat-Conversation-POST",
            "Runtime": "nodejs8.10",
            "Role": "arn:aws:iam::89XXXXXXXX14:role/chat-lambda-data",
            "Handler": "index.handler",
            "CodeSize": 474,
            "Description": "",
            "Timeout": 15,
            "MemorySize": 128,
            "LastModified": "2019-05-02T13:20:53.887+0000",
            "CodeSha256": "h1bxXaXXXXXXxxxxxxxxxXxxXxxxxxxXXXxxxxxxmGg=",
            "Version": "$LATEST",
            "TracingConfig": {
                "Mode": "PassThrough"
            },
            "RevisionId": "f447bca3-06f9-49d8-8a5d-c740f6aec405"
        },
        {
            "FunctionName": "Chat-Conversation-GET",
            "FunctionArn": "arn:aws:lambda:us-west-2:89XXXXXXXX14:function:Chat-Conversation-GET",
            "Runtime": "nodejs8.10",
            "Role": "arn:aws:iam::89XXXXXXXX14:role/service-role/chat-lambda-data",
            "Handler": "index.handler",
            "CodeSize": 785,
            "Description": "",
            "Timeout": 25,
            "MemorySize": 128,
            "LastModified": "2019-05-04T14:23:07.002+0000",
            "CodeSha256": "h1bxXaXXXXXXxxxxxxxxxXxxXxxxxxxXXXxxxxxxmGg=",
            "Version": "$LATEST",
            "VpcConfig": {
                "SubnetIds": [],
                "SecurityGroupIds": [],
                "VpcId": ""
            },
            "TracingConfig": {
                "Mode": "PassThrough"
            },
            "RevisionId": "210dd3fa-ba47-4e06-ab53-e34aa793b344"
        }
    ]
}

Now, one could either use multiple selection (ctrl+d) in Sublime or a python script to extract the list of resource ARN/names.

Step2: Iterate this list of resource names, and fetch tagging details for all of them & append the output of these commands to a file.

echo RESOURCE_NAME: &>> tagsList.txt

aws lambda list-tags --resource arn:aws:lambda:us-west-2:89XXXXXXXX14:function:RESOURCE_NAME --profile PROFILE_NAME &>> tagsList.txt

The output of the above command is also a JSON object:

{
    "Tags": {}
}

As you can see there is no name attribute here so, we add the resource name before command output using echo:

RESOURCE_NAME:{
    "Tags": {}
}

Let us say, the resource names we have got in resourcesList.txt are as follows:

  • new-client-acquisition
  • initiate-raw-file-ingestion
  • initiate-raw-crawler
  • raw-refined-transform
  • initiate-refined-crawler
  • check-status

Creating commands for the above resources in sublime:

Step3: Extract resources with no tags from the tagsList.txt file.

Untagged = all – tagged

From the resourcesList.txt we get all the resource names, and from the tagsList.txt we get all tagged resources. You could use both these lists to get the untagged resources.

Step4: Preparing and Updating the tags

aws lambda tag-resource --resource arn:aws:lambda:us-west-2:89XXXXXXXX14:function:RESOURCE_NAME --tags Environment=prod,Project=sales,Name=RESOURCE_NAME --profile PROFILE_NAME

Create multiple commands for each resource name with the above template.

Once you create all the commands just copy and paste them in the terminal. That would update all the resources with new tags.

This is pretty much the steps involved in tagging resources, maybe a few tweaks have to be made depending on the AWS service.

Note: Output of all the above commands are executed with default region name specified in the AWS CLI profile, if not specified in the command.

One other way of tagging resources on AWS is using Tag Editor in Resource Groups. I found it hard to work with as one couldn’t sophisticatedly search, filter or group resource names.

I hope it was helpful. For any queries or if you know a better way of tagging let us know in the comment section. Happy to discuss it further.

Thank-you!

This story is authored by Koushik. Koushik is a software engineer and a keen data science and machine learning enthusiast.

How to Customize QuickSight Dashboards for User Specific Data

We have been getting a lot of queries on how to customize a single QuickSight dashboard for user specific data. We can accomplish this by filtering the dashboard data with login username using AWS QuickSight’s Row-Level Security. To further explain this use-case, let’s consider the sales department in a company. Every day your team of sales agents contacts a list of potential customers. Now you need a single dashboard that is accessed by all the agents but only displays the list of prospects he or she is assigned to.

Note: This is completely different from filter/controls on QuickSight dashboards. If you have filters/controls/parameters set up with dynamic values being picked up from the dataset, then even that data is filtered with Row-Level security, as the underlying dataset itself is filtered with the login username.

Let’s get on with the show! I have created a hypothetical data set. This dataset has a column named assigned-agent which shall be used for filtering.

Using this dataset, I have created a dashboard that looks like below.

This dashboard is shared with two other IAM users (sales agents).

As we haven’t set up any rules both of them can access whole data.

As you can see ziva, could also access whole data and we don’t want that!

Our requirement:

User NameAgent NamePermissions
nickNick HoweCan access only his prospects
zivaZiva MedalleCan access only her prospects
managerNASuper user, can access all prospects

Creating Data Set Rules for Row-Level Security:

Create a file or a query that contains the data set rules (permissions).

It doesn’t matter what order the fields are in. However, all the fields are case-sensitive. They must exactly match the field names and values.

The structure should look similar to one of the following. You must have at least one field that identifies either users or groups. You can include both, but only one is required, and only one is used at a time. If you are specifying groups, use only Amazon QuickSight groups or Microsoft AD groups.

The following example shows a table with user names.

UserNameagent_assigned
nickNick Howe
zivaZiva Medalle
managerNick Howe,Ziva Medalle

For SQL:

/* for users*/
select User as UserName, Agent as agent_assigned
from permissions_table;

Or if you prefer to use a .csv file:

UserName,agent_assigned
"nick","Nick Howe"
"ziva","Ziva Medalle"
"manager","Nick Howe,Ziva Medalle"

Here agent_assigned is a column in the dataset, and UserName is the same as QuickSight login name.

What we are essentially doing is mapping UserName with the agent_assigned column. Let’s suppose ziva has logged in, only those records with condition agent_assigned = Ziva Medalle are picked up. Same is the case with nick.

But in the case of the manager, we want him to be a superuser, so we added all the agent names (agent_assigned column values).

Note: If you are using an Athena or an RDS or a Redshift or an S3 CSV file-based dataset, just make sure the output format/structure of those sources matches the above-mentioned formats.

Create Permissions Data Set:

Create a QuickSight dataset with the above data set rules. Go to Manage data, choose New data set, choose source and create accordingly. As mine is a CSV, I have just uploaded it. To make sure that you can easily find it, give it a meaningful name, for example in my case Permissions-prospects-list.

After finishing, Refresh the page as it might not appear in the data sources list while applying it to the dataset.

Creating Row-Level Security: 

Choose Permissions, From the list choose the permissions dataset that you have created earlier.

Choose the Apply data set.

Once you have applied, you should be seeing the dataset has a new lock symbol on it saying restricted.

That’s it. Now the data is filtered/secured based on username.

Manager’s Account:

Ziva’s Account:

Nick’s Account:

You could also add Users to Groups and have permissions set at the group level. More information here.

I hope it was helpful, any queries drop them in the comments section.

Thanks for the read!

This story is authored by Koushik. Koushik is a software engineer and a keen data science and machine learning enthusiast.

Machine Learning based Fuzzy Matching using AWS Glue ML Transforms

Machine Learning Transforms in AWS Glue

Machine Learning Transforms in AWS Glue

AWS Glue provides machine learning capabilities to create custom transforms to do Machine Learning based fuzzy matching to deduplicate and cleanse your data. For this we are going to use a transform named FindMatches. The FindMatches transform enables you to identify duplicate or matching records in your dataset, even when the records do not have a common unique identifier and no fields match exactly. This will not require writing any code or knowing how machine learning works. For more details about ML Transforms, please go through the docs.

Creating a Machine Learning Transform with AWS Glue

This article walks you through the actions to create and manage a machine learning (ML) transform using AWS Glue. I assume that you are familiar with using the AWS Glue console to add crawlers and jobs and edit scripts. You should also be familiar with finding and downloading files on the Amazon Simple Storage Service (Amazon S3) console.

In case you are just starting out on AWS Glue, I have explained how to create an AWS Glue Crawler and Glue Job from scratch in one of my earlier articles.
The source data used in this blog is a hypothetical file named customers_data.csv. A second file, label_file.csv, is an example of a labeling file that contains both matching and nonmatching records used to teach the transform.

Step 1: Crawl the Data using AWS Glue Crawler

At the outset, crawl the source data from the CSV file in S3 to create a metadata table in the AWS Glue Data Catalog. I created a crawler pointing to the source location (s3://bucketname/data/ml-transform/customers/).

In case you are just starting out on the AWS Glue crawler, I have explained how to create one from scratch in one of my earlier articles. If you run this crawler, it creates a customers table in the specified database (ml-transform).

Step 2: Add a Machine Learning Transform

Next, add a machine learning transform that is based on the schema of your data source table created by the above crawler.

  • On the AWS Glue console, in the navigation pane, choose ML Transforms, Add transform.
    1. For transform name, enter ml-transform. This is the name of the transform that is used to find matches in the source data.
    2. Choose an IAM role that has permission to access Amazon S3 and AWS Glue API operations.

Choose Worker type and Maximum capacity as per the requirements.
3. For Data source, choose the table that was created in the earlier step. In this, the table named customers in database ml-transform.
4. For Primary key, choose the primary key column for the table, email.

  • Choose Finish.

Step 3: How to Teach Your Machine Learning Transform

Next, teach the machine learning transform using the sample labeling file.
You can’t use a machine language transform in an extract, transform, and load (ETL) job until its status is Ready for use. To get your transform ready, you must teach it how to identify matching and non-matching records by providing examples of matching and non-matching records. To teach your transform, you can Generate a label file, add labels, and then Upload label file.

For this article, the label file I have used is label_file.csv

  • On the AWS Glue console, in the navigation pane, choose ML Transforms.
  • Choose the earlier created transform, and then choose Action, Teach.
  • If you don’t have the label file, choose I do not have labels, you can Generate a label file, add labels, and then Upload label file.

If you have the label file, choose I have labels, then choose Upload labelling file from S3.
Choose an Amazon S3 path to the sample labeling file in the current AWS Region. (s3://bucketname/data/ml-transform/labels/label_file.csv) with the option to overwrite existing labels. The labeling file must be located in S3 in the same Region as the AWS Glue console.

When you upload a labeling file, a task is started in AWS Glue to add or overwrite the labels used to teach the transform how to process the data source.

  • Choose Finish, and return to the ML transforms list.

Step 4: Estimate the Quality of ML Transform

What is Labeling?

The act of labeling is creating a labeling file (such as in a spreadsheet) and adding identifiers, or labels, into the label column that identifies matching and non-matching records. It is important to have a clear and consistent definition of a match in your source data. AWS Glue learns from which records you designate as matches (or not) and uses your decisions to learn how to find duplicate records.

Next, you can estimate the quality of your machine learning transform. The quality depends on how much labeling you have done.

  • On the AWS Glue console, in the navigation pane, choose ML Transforms.
  • Choose the earlier created transform, and choose the Estimate quality tab. This tab displays the current quality estimates, if available, for the transform.
  • Choose Estimate quality to start a task to estimate the quality of the transform. The accuracy of the quality estimate is based on the labeling of the source data.
  • Navigate to the History tab. In this pane, task runs are listed for the transform, including the Estimating quality task. For more details about the run, choose Logs. Check that the run status is Succeeded when it finishes.

Step 5: Create and Run a Job with ML Transform

In this step, we use your machine learning transform to add and run a job in AWS Glue. When the transform is Ready for use, we can use it in an ETL job.

On the AWS Glue console, in the navigation pane, choose Jobs.

Choose Add job.

In case you are just starting out on AWS Glue ETL Job, I have explained how to create one from scratch in one of my earlier articles.

  • For Name, choose the example job in this tutorial, ml-transform.
  • Choose an IAM role that has permission to access Amazon S3 and AWS Glue API operations.
  • For ETL language, choose Spark 2.2, Python 2. Machine learning transforms are currently not supported for Spark 2.4.
  • For Data source, choose the table created in Step 1. The data source you choose must match the machine learning transform data source schema.
  • For Transform type, choose to Find matching records to create a job using a machine learning transform.
  • For Transform, choose transform created in step 2, the machine learning transform used by the job.
  • For Create tables in your data target, choose to create tables with the following properties.
    • Data store type — Amazon S3
    • Format — CSV
    • Compression type — None
    • Target path — The Amazon S3 path where the output of the job is written (in the current console AWS Region)

Choose Save job and edit script to display the script editor page. The script looks like the following. After you edit the script, choose Save.

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglueml.transforms import FindMatches

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## @type: DataSource
## @args: [database = "ml_transforms", table_name = "customers", transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "ml_transforms", table_name = "customers", transformation_ctx = "datasource0")
## @type: ResolveChoice
## @args: [choice = "MATCH_CATALOG", database = "ml_transforms", table_name = "customers", transformation_ctx = "resolvechoice1"]
## @return: resolvechoice1
## @inputs: [frame = datasource0]
resolvechoice1 = ResolveChoice.apply(frame = datasource0, choice = "MATCH_CATALOG", database = "ml_transforms", table_name = "customers", transformation_ctx = "resolvechoice1")
## @type: FindMatches
## @args: [transformId = "eacb9a1ffbc686f61387f63", emitFusion = false, survivorComparisonField = "<primary_id>", transformation_ctx = "findmatches2"]
## @return: findmatches2
## @inputs: [frame = resolvechoice1]
findmatches2 = FindMatches.apply(frame = resolvechoice1, transformId = "eacb9a1ffbc686f61387f63", transformation_ctx = "findmatches2")
## @type: DataSink
## @args: [connection_type = "s3", connection_options = {"path": "s3://bucket-name/data/ml-transforms/output/"}, format = "csv", transformation_ctx = "datasink3"]
## @return: datasink3
## @inputs: [frame = findmatches2]
datasink3 = glueContext.write_dynamic_frame.from_options(frame = findmatches2, connection_type = "s3", connection_options = {"path": "s3:/<bucket-name>/data/ml-transforms/output/"}, format = "csv", transformation_ctx = "datasink3")
job.commit()

Choose Run job to start the job run. Check the status of the job in the jobs list. When the job finishes, in the ML transform, History tab, there is a new Run ID row added of type ETL job. 

Navigate to the Jobs, History tab. In this pane, job runs are listed. For more details about the run, choose Logs. Check that the run status is Succeeded when it finishes.

Step 6: Verify Output Data from Amazon S3 in Amazon Athena

In this step, check the output of the job run in the Amazon S3 bucket that you chose when you added the job. You can create a table in the Glue Data catalog pointing to the output location, just like the way we crawled the source data in Step 1. You can then query the data in Athena.

However, the Find matches transform adds another column named match_id to identify matching records in the output. Rows with the same match_id are considered matching records.

If you don’t find any matches, you can continue to teach the transform by adding more labels.

Thanks for the read and look forward to your comments

This story is authored by PV Subbareddy. Subbareddy is a Big Data Engineer specializing on AWS Big Data Services and Apache Spark Ecosystem.

Processing High Volume Big Data Concurrently with No Duplicates using AWS SQS

In this blog post, we’ll be looking at how one could leverage AWS Simple Queue Service (Standard queue) to achieve high concurrency while processing with no duplicates. Also we compare it with other AWS services like DynamoDB, SQS FIFO queue and Kinesis in terms of cost and performance.

A simple use case for the below architecture could be building an end-end messaging service, or sending out transactional emails. In both the above use cases, a highly concurrent processing with no duplicates is needed.

Using AWS SQS with Lambda to process Big data concurrently with no duplicates

We have a Lambda function that writes messages to the Standard queue in batches. This writer function is invoked when a file is posted to S3. While there are messages in the queue, Lambda polls the queue, reads messages in batches and invokes the Processor function synchronously with an event that contains queue messages. The processing function is invoked once for each batch. When the function successfully processes a batch, Lambda deletes its messages from the queue. If at all the function fails processing(raise error) the batch is put back in the queue. Now, the Standard queue is configured with redrive policy to move messages to a Dead Letter Queue (DLQ) when receive request reaches the Maximum receive count(MRC). We set the MRC to 1 to ensure deduplication.

Setting up Standard Queue with Dead Letter Queue

We need two queues one for processing, second for moving failed messages into it. First create the failed_messages queue. As it is needed while creating the message processing queue. Create a new queue, give it a name (failed_messages), select type as Standard and choose Configure Queue

According to the needs, set the queue attributes like visibility timeout, message retention period etc.

For processing messages, Create a new queue, give it a name, select type as standard and choose Configure Queue.

Set the Default Visibility Timeout to 5min and Dead Letter Queue Settings to setup the redrive policy to move failed messages into failed_messages queue created earlier.

From the SQS homepage, select processing queue, and select Redrive Policy, If setup correctly you should see the ARN of failed_messages queue there.

Creating the Writer and Processor lambda functions:

Writer.py

# Write batch messages to queue
import csv
import boto3

s3 = boto3.resource('s3')

# Update this dummy URL
processing_queue_url = "https://sqs.us-west-2.amazonaws.com/85XXXXXXX205/ToBeProcessed"

def lambda_handler(event, context):
    try:
        if 'Records' in event:
            bucket_name = event['Records'][0]['s3']['bucket']['name']        
            key = event['Records'][0]['s3']['object']['key']
            bucket = s3.Bucket(bucket_name)
            obj = bucket.Object(key=key)

            # get the object
            response = obj.get()['Body'].read().decode('utf-8').split('\n')
            resp = list(response)
            if resp[-1] == '':
                #removing header metadata and extra newline
                total_records = len(resp) - 2 
            else:
                #removing header metadata
                total_records = len(resp) - 1 
            print("total record count is :", total_records)

            batch_size = 0
            record_count = 0
            messages = []

            # Write to SQS
            for row in csv.DictReader(response):
                record_count += 1
                record = {}
                for k,v in row.items():
                    record[k] = v

                # Replace below with appropriate column with all values as unique
                unique_id = record['ANY_COLUMN_WITH_ALL_VALUES_UNIQUE']
                
                batch_size += 1
                messages.append(
                {
                    'Id': unique_id,
                    'MessageBody': json.dumps(record)
                })
                   
                if (batch_size == 10):
                    batch_size = 0
                    try:
                        response = sqs.send_message_batch(
                            QueueUrl = processing_queue_url,
                            Entries = messages
                        )
                        print("response:", response)
                        if 'Failed' in response:
                            print('failed_count:', len(response['Failed']))
                    except Exception as e:
                        print("error:",e)
                    messages = []
                
                # Handling last batch
                if(record_count == total_records):
                    print("batch size is :", batch_size)
                    batch_size = 0
                    try:
                        response = sqs.send_message_batch(
                            QueueUrl = processing_queue_url,
                            Entries = messages
                        )
                        print("response:", response)
                        if 'Failed' in response:
                            print('failed count is :', len(response['Failed']))
                    except Exception as e:
                        print("error:",e)
                    messages = []    
        
        print('record count is :', record_count)

    except Exception as e:
        return e

Processor.py

# Process queue messages

def handler(event, context):
    if 'Records' in event:
        try:
            messages = event['Records']
            for message in messages:
                print("message to be processed :", message)
                
                result = message['body']
                result = json.loads(result)

                print("result:",result)
            return {
                'statusCode': 200,
                'body': 'All messages processed successfully'
            }

        except Exception as e:
            print(e)
            return str(e)

Setting up S3 as trigger to Writer lambda

Setting up SQS trigger to processor Lambda

If set up properly, you should be able to view it in Lambda Triggers section from the SQS homepage like this.

The setup is done. To test this upload a .csv file to the S3 location.

SQS Standard Queue in comparison with FIFO queue

FIFO queue in SQS supports deduplication in two ways:

  1. Content based deduplication while writing to SQS.
  2. Processing one record/batch at a time. 

Unlike Standard Queue, FIFO doesn’t support concurrency and lambda invocation. On top of all this there is a limit to how many messages you could write to FIFO queue in a second. FIFO queues are much suited when the order of processing is important.

Cost analysis:
First 1 million Amazon SQS requests are free each month.

TypeCost per 1 million requests
Standard Queue$0.40
FIFO Queue$0.50

More on SQS pricing here.

SQS Standard Queue in comparison with DynamoDB

DynamoDB streams are slow when compared SQS, and costs on various aspects like:

  1. Data Storage
  2. Writes
  3. Reads
  4. Provisioned throughput
  5. Reserved capacity
  6. Indexed data storage
  7. Streams and many more.

In a nutshell, DynamoDB’s monthly cost is dictated by data storage, writes and reads. The best use cases for DynamoDB are those that require a flexible data model, reliable performance, and the automatic scaling of throughput capacity.

SQS Standard Queue in comparison with Kinesis

Kinesis primary use case is collecting, storing and processing real-time continuous data streams. Kinesis is designed for large scale data ingestion and processing, with the ability to maximise write throughput for large volumes of data.

While a message queue makes it easy to decouple and scale micro-services, distributed systems, and serverless applications. Using a queue, you can send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be always available. In a nutshell, Serverless applications are built using micro services, message queue serves as a reliable plumbing.

Drawbacks of Kinesis:

  1. Shard management
  2. Limited Read Throughput

For a much detailed comparison of SQS and Kinesis visit here.

Thanks for the read, I hope it was helpful.

This story is authored by Koushik. Koushik is a software engineer and a keen data science and machine learning enthusiast.

Text Detection in React Native App using AWS Rekognition

In this story, we are going to build an app for detecting text in an image using Amazon Rekognition in React Native.

You shall learn how to build a mobile application in React Native, which talks to AWS API Gateway. This API endpoint is configured with a lambda that stores the sent image in S3 and detects the text using AWS Rekognition and sends back the response.

Installing dependencies:

Let’s go to React Native Docs, select React Native CLI Quickstart and select our Development OS and Target OS -> Android, as we are going to build an android application.

Follow the docs for installing dependencies, after installing create a new React Native Application. Use the command line interface to generate a new React Native project called text-detection.

react-native init text-detection

Preparing the Android device:

We shall need an Android device to run our React Native Android app. This can be either a physical Android device, or more commonly, we can use an Android Virtual Device (AVD) which allows us to emulate an Android device on our computer (using Android Studio).

Either way, we shall need to prepare the device to run Android apps for development. If you have a physical Android device, you can use it for development in place of an AVD by connecting it to your computer using a USB cable and following the instructions here.

If you are using a virtual device follow this link. I shall be using physical android device.
Now go to command line and run react-native run-android inside your React Native app directory:

cd text-detection
react-native run-android

If everything is set up correctly, you should see your new app running in your physical device or Android emulator.

API Creation in AWS Console:

Before going further, create an API in your AWS console following this link. Once you are done with creating API come back to the React Native application. Now, go to your project directory and Replace your App.js file with the following code.
Now, go to your project directory and Replace your App.js file with the following code.

import React, {Component} from 'react';
import { StyleSheet, View, Text, TextInput, Image, ScrollView, TouchableHighlight } from 'react-native';
import ImagePicker from "react-native-image-picker";
import Amplify, {API} from "aws-amplify";
Amplify.configure({
    API: {
        endpoints: [
            {
                name: <Your API name>,
                Endpoint: <Your end-point url>
            }
        ]
    }
});

class Registration extends Component {
  
    constructor(props){
        super(props);
        this.state =  {
            imageName : '',
            capturedImage : '',
            detectedText: []
        };
    }

    captureImageButtonHandler = () => {
        ImagePicker.showImagePicker({title: "Pick an Image", maxWidth: 800, maxHeight: 600}, (response) => {
            console.log('Response - ', response);
            alert(response)
            if (response.didCancel) {
                console.log('User cancelled image picker');
            } else if (response.error) {
                console.log('ImagePicker Error: ', response.error);
            } else if (response.customButton) {
                console.log('User tapped custom button: ', response.customButton);
            } else {
                // You can also display the image using data:
                const source = { uri: 'data:image/jpeg;base64,' + response.data };
            
                this.setState({capturedImage: response.uri, base64String: source.uri });
            }
        });
    }

    submitButtonHandler = () => {
        if (this.state.imageName == '' || this.state.imageName == undefined || this.state.imageName == null) {
            alert("Please Enter the image name");
        } else if (this.state.capturedImage == '' || this.state.capturedImage == undefined || this.state.capturedImage == null) {
            alert("Please Capture the Image");
        } else {
            console.log("submiting")
            const apiName = "faceRekognition";
            const path = "/detecttext";
            const init = {
                headers: {
                    'Accept': 'application/json',
                    "Content-Type": "application/x-amz-json-1.1"
                },
                body: JSON.stringify({
                    Image: this.state.base64String,
                    name: this.state.imageName
                })
            }

            API.post(apiName, path, init).then(response => {
                console.log("Response Data is : " + JSON.stringify(response));

                if (JSON.stringify(response.TextDetections.length) > 0) {

                    this.setState({
                        detectedText: response.TextDetections
                    })
                    
                } else {
                    alert("Please Try Again.")
                }
            });
        }
    }
    
  
    render() {
        console.log(this.state.detectedText)
        var texts = this.state.detectedText.map(text => {
            return <Text style={{textAlign: 'center'}}>{text.DetectedText}</Text>
        })
        
        return (
            <View>
                <ScrollView>
                    <Text style= {{ fontSize: 20, color: "#000", textAlign: 'center', marginBottom: 15, marginTop: 10 }}>Text Image</Text>
                
                    <TextInput
                        placeholder="file name"
                        onChangeText={imageName => this.setState({imageName: imageName})}
                        underlineColorAndroid='transparent'
                        style={styles.TextInputStyleClass}
                    />

                    {this.state.capturedImage !== "" && <View style={styles.imageholder} >
                        <Image source={{uri : this.state.capturedImage}} style={styles.previewImage} />
                    </View>}
                    <View>
<br/>
                        {texts}
                    </View>
                    <TouchableHighlight style={[styles.buttonContainer, styles.captureButton]} onPress={this.captureImageButtonHandler}>
                        <Text style={styles.buttonText}>Capture Image</Text>
                    </TouchableHighlight>

                    <TouchableHighlight style={[styles.buttonContainer, styles.submitButton]} onPress={this.submitButtonHandler}>
                        <Text style={styles.buttonText}>Submit</Text>
                    </TouchableHighlight>
                    
                </ScrollView>
            </View>
        );
    }
}

const styles = StyleSheet.create({
    TextInputStyleClass: {
      textAlign: 'center',
      marginBottom: 7,
      height: 40,
      borderWidth: 1,
      margin: 10,
      borderColor: '#D0D0D0',
      borderRadius: 5 ,
    },
    inputContainer: {
      borderBottomColor: '#F5FCFF',
      backgroundColor: '#FFFFFF',
      borderRadius:30,
      borderBottomWidth: 1,
      width:300,
      height:45,
      marginBottom:20,
      flexDirection: 'row',
      alignItems:'center'
    },
    buttonContainer: {
      height:45,
      flexDirection: 'row',
      alignItems: 'center',
      justifyContent: 'center',
    //   marginBottom:20,
      width:"80%",
      borderRadius:30,
    //   marginTop: 20,
      margin: 20,
    },
    captureButton: {
      backgroundColor: "#337ab7",
      width: 350,
    },
    buttonText: {
      color: 'white',
      fontWeight: 'bold',
    },
    horizontal: {
      flexDirection: 'row',
      justifyContent: 'space-around',
      padding: 10
    },
    submitButton: {
      backgroundColor: "#C0C0C0",
      width: 350,
      marginTop: 5,
    },
    imageholder: {
      borderWidth: 1,
      borderColor: "grey",
      backgroundColor: "#eee",
      width: "50%",
      height: 150,
      marginTop: 10,
      marginLeft: 90,
      flexDirection: 'row',
      alignItems:'center'
    },
    previewImage: {
      width: "100%",
      height: "100%",
    }
});

export default Registration;

In the above code, we are configuring amplify with the API name and Endpoint URL that you created as shown below.

Amplify.configure({
 API: {
   endpoints: [
     {
       name: '<Your-API-Name>, 
       endpoint:'<Endpoint-URL>',
     },
   ],
 },
});

By clicking the capture button it will trigger the captureImageButtonHandler function. It will then ask the user to take a picture or select from file system. When user captures the image or selects from file system, we will store that image in the state as shown below.

captureImageButtonHandler = () => {
   this.setState({
     objectName: '',
   });
 
   ImagePicker.showImagePicker(
     {title: 'Pick an Image', maxWidth: 800, maxHeight: 600},
     response => {
       console.log('Response = ', response);
       if (response.didCancel) {
         console.log('User cancelled image picker');
       } else if (response.error) {
         console.log('ImagePicker Error: ', response.error);
       } else if (response.customButton) {
         console.log('User tapped custom button: ', response.customButton);
       } else {
         // You can also display the image using data:
         const source = {uri: 'data:image/jpeg;base64,' + response.data};
         this.setState({
           capturedImage: response.uri,
           base64String: source.uri,
         });
       }
     },
   );
 };

After capturing the image we will preview that image. By Clicking on submit button, submitButtonHandler function will get triggered where we will send the image to the end point as shown below.

submitButtonHandler = () => {
        if (this.state.imageName == '' || this.state.imageName == undefined || this.state.imageName == null) {
            alert("Please Enter the image name");
        } else if (this.state.capturedImage == '' || this.state.capturedImage == undefined || this.state.capturedImage == null) {
            alert("Please Capture the Image");
        } else {
            console.log("submiting")
            const apiName = "faceRekognition";
            const path = "/detecttext";
            const init = {
                headers: {
                    'Accept': 'application/json',
                    "Content-Type": "application/x-amz-json-1.1"
                },
                body: JSON.stringify({
                    Image: this.state.base64String,
                    name: this.state.imageName
                })
            }

            API.post(apiName, path, init).then(response => {
                console.log("Response Data is : " + JSON.stringify(response));
                if (JSON.stringify(response.TextDetections.length) > 0) {
                    this.setState({
                        detectedText: response.TextDetections
                    })
                    
                } else {
                    alert("Please Try Again.")
                }
            });
        }
    }

Lambda Function:

Add the following code into your lambda function that you created in your AWS Console.

const AWS = require('aws-sdk');
var rekognition = new AWS.Rekognition();
var s3Bucket = new AWS.S3( { params: {Bucket: "detect-text-in-image"} } );
var fs = require('fs');

exports.handler = (event, context, callback) => {
    let parsedData = JSON.parse(event)
    let encodedImage = parsedData.Image;
    var filePath = parsedData.name;
    let buf = new Buffer(encodedImage.replace(/^data:image\/\w+;base64,/, ""),'base64')
    var data = {
        Key: filePath, 
        Body: buf,
        ContentEncoding: 'base64',
        ContentType: 'image/jpeg'
    };
    s3Bucket.putObject(data, function(err, data){
        if (err) { 
            console.log('Error uploading data: ', data);
            callback(err, null);
        } else {
            var params = {
              Document: { /* required */
                Bytes: buf ,
                S3Object: {
                  Bucket: 'detect-text-in-image',
                  Name: filePath,
                //   Version: 'STRING_VALUE'
                }
              },
              FeatureTypes: ["TABLES" | "FORMS"]
            };

            var params = {
              Image: {
              S3Object: {
                Bucket: "detect-text-in-image", 
                Name: filePath
              }
              }
              };
            rekognition.detectText(params, function(err, data) {
                if (err){
                    console.log(err, err.stack);
                    callback(err)
                }
                else{
                    console.log(data);
                    callback(null, data);
                }
            });
        }
    });
};

In the above code, we would receive the image from React Native which we are storing in S3 Bucket. The stored image is sent to Amazon Recognition which has detectText method that detects the text in the image and sends the response with the detected text in JSON format.

Note: Make sure you have given permissions to the IAM role to access AWS Rekognition’s detectText API.

Here is how your home screen looks like:

Once you capture an image you can see a preview of that image as shown below.

On submitting the captured image with file name you can see the text in that image as shown below:

That’s all folks! I hope it was helpful.

This story is authored by Venu Vaka. He is a software engineer specializing in ReactJS and AWS Cloud.

Real Time Streaming Data Analytics using Amazon Kinesis Family

Amazon Kinesis Data Analytics

Amazon Kinesis Data Analytics (KDA) is the easiest way to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time. KDS reduces the complexity of building, managing and integrating streaming applications with other AWS services. SQL users can easily query streaming data or build entire streaming applications using templates and an interactive SQL editor. Java developers can quickly build sophisticated streaming applications using open source Java libraries and AWS integrations to transform and analyze data in real-time.

For deep dive into Amazon Kinesis Data Analytics, please go through the official docs.

Amazon Kinesis Data Streams

Amazon Kinesis Data Streams (KDS) is a massively scalable and durable real-time data streaming service. KDS can continuously capture gigabytes of data per second from hundreds of thousands of sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events. The data collected is available in milliseconds to enable real-time analytics use cases such as real-time dashboards, real-time anomaly detection, dynamic pricing, and more.

For more details about Amazon Kinesis Data Streams, please go through the official docs.

Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, transform, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.

For more details about Amazon Kinesis Data Firehose, please go through the official docs.

To Create an Amazon Kinesis Data Stream using Console

  • Open the Kinesis console at https://console.aws.amazon.com/kinesis.
  • In the navigation bar, expand the Region selector and choose a Region.
  • Choose Create data stream.
  • On the Create Kinesis stream page, enter a name for your stream and the number of shards you need, and then click Create Kinesis stream.
    On the Kinesis streams page, your stream’s Status is shown as Creating while the stream is being created. When the stream is ready to use, the Status changes to Active.

Amazon Kinesis Data Generator

The Amazon Kinesis Data Generator (KDG) makes it easy to send data to Kinesis Streams or Kinesis Firehose.

While following this link, choose to Create a Cognito User with Cloud Formation.

  • Choose Create a Cognito User with Cloud Formation.
  • After choosing the above option, console redirects to the Cloud Formation Stack creation page. The console looks like the following.
  • Click on Next, provide the CloudFormation Stack Name and provide username, password details for creating Cognito User for Kinesis Data Generator. 
  • Click on Next and choose Create Stack.
  • After Status of Stack changes to Create complete, click on Outputs tab and open the link under the outputs section.
  • After opening the above link, provide the username and password created in earlier steps.
  • Select Region and Stream/delivery name as created.
    The Record template is 
{
    "sensor_id": {{random.number(50)}},
    "current_temperature": {{random.number(
        {
            "min":0,
            "max":150
        }
    )}},
    "location": "{{random.arrayElement(
        ["AUS","USA","UK"]
    )}}"
}

To Create the Kinesis Data Analytics Application

  • Open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.
  • Choose Create application.
  • On the Create application page, type an application name, type a description, choose SQL for the application’s Runtime setting, and then choose Create application.

Doing this creates a Kinesis data analytics application with a status of READY. The console shows the application hub where you can configure input and output.

In the next step, you configure input for the application. In the input configuration, you add a streaming data source to the application and discover a schema for an in-application input stream by sampling data on the streaming source.

Configure Streaming Source as Input to Kinesis Data Analytics Application

  • On the Kinesis Analytics applications page in the console, choose Connect streaming data.
  • Source section, where you specify a streaming source for your application. You can select an existing stream source or create one. By default the console names the in-application input stream that is created as INPUT_SQL_STREAM_001. For this exercise, keep this name as it appears.
    Stream reference name – This option shows the name of the in-application input stream that is created, SOURCE_SQL_STREAM_001. You can change the name of the stream.
  • Choose Discover Schema, which automatically discovers the schema of input stream.
  • Choose Save and continue.
    Now, we have an application with input configuration added to it. In the next step, we will add SQL code to perform some analytics on the data in-application input stream.

 Real-Time Analytics on Input Stream Data

  • On the Kinesis Analytics applications page in the console, choose Go to SQL editor.
  • In the Would you like to start running “ApplicationName”? dialog box, choose Yes, start application.
  • The console opens the SQL editor page. Review the page, including the buttons (Add SQL from templates, Save and run SQL) and various tabs.
  • Run Analytics on the input stream data using the following sample query. This Query detects an anomaly in the input stream and sends the anomaly data to anomaly_data_stream and normal data to output_data_stream. Load the following query in SQL editor and choose Save and run SQL.
CREATE OR REPLACE STREAM "anomaly_data_stream" (
	"sensor_id" INTEGER,
	"current_temperature" INTEGER, 
	"location" VARCHAR(16));

CREATE OR REPLACE  PUMP "STREAM_PUMP_ANOMALY" AS INSERT INTO "anomaly_data_stream"
SELECT STREAM "sensor_id",
				"current_temperature",
				"location"
FROM "SOURCE_SQL_STREAM_001" WHERE "current_temperature" > 100;

CREATE OR REPLACE STREAM "output_data_stream" (
	"sensor_id" INTEGER,
	"current_temperature" INTEGER, 
	"location" VARCHAR(16));

CREATE OR REPLACE  PUMP "STREAM_PUMP_OUTPUT" AS INSERT INTO "output_data_stream"
SELECT STREAM "sensor_id",
				"current_temperature",
				"location"
FROM "SOURCE_SQL_STREAM_001" WHERE "current_temperature" < 100;

It creates the in-application stream output_data_stream and anomaly_data_stream.
It creates the pump STREAM_PUMP_OUTPUT and STREAM_PUMP_ANOMALY, and uses it to select rows from SOURCE_SQL_STREAM_001 and insert them in the output_data_stream and anomaly_data_stream. You can see the results in the Real-time analytics tab.

  • The SQL editor has the following tabs:

    The Source data tab shows an in-application input stream data that is mapped to the streaming source. Choose the in-application stream, and you can see data coming in. ROWTIME – Each row in an in-application stream has a special column called ROWTIME. This column is the timestamp when Amazon Kinesis Data Analytics inserted the row in the first in-application stream (the in-application input stream that is mapped to the streaming source).

    The Real-time Analytics tab shows all the other in-application streams created by your application code. It also includes the error stream. Choose DESTINATION_SQL_STREAM to view the rows your application code inserted. 

    The Destination tab shows the external destination where Kinesis Data Analytics writes the query results. We haven’t configured any external destination for our application output yet. 

To create a delivery stream from Kinesis Data Firehose to Amazon S3

  • Open the Kinesis Data Firehose console at https://console.aws.amazon.com/firehose/.
  • Choose Create Delivery Stream. In this case, the name of the stream is anomaly-delivery-stream.
  • On the Destination page, choose the following options.
    • Destination – Choose Amazon S3.
    • Delivery stream name – Type a name for the delivery stream
    • S3 bucket – Choose an existing bucket, or choose New S3 Bucket. If you create a new bucket, type a name for the bucket and choose the region your console is currently using.
    • S3 prefix – Stream stores data in the provided prefix. For anomaly data, the prefix becomes 
      data/anomaly/year=!{timestamp:YYYY}/month=!{timestamp:MM}/day=!{timestamp:dd}/hour=!{timestamp:HH}/
    • S3 error prefix – errors in delivering stream to s3, stores in error prefix.
  • Choose Next.
  • On the Configuration page, leave the fields at the default settings. The only required step is to select an IAM role that enables Kinesis Data Firehose to access your resources, as follows:
    1. For IAM Role, choose Select an IAM role.
    2. In the drop-down menu, under Create/Update existing IAM role, choose Firehose delivery IAM role, leave the fields at their default settings, and then choose Allow.
  • Choose Next.
  • Review your settings, and then choose Create Delivery Stream.

The anomaly-delivery-stream created successfully. In the same way, create another Firehose stream named output-delivery-stream.

Configuring Application Output to Amazon Kinesis Data Firehose

We can optionally add an output configuration to the application, to persist everything written from an in-application stream to an external destination such as an Amazon Kinesis data stream, a Kinesis Data Firehose delivery stream, or an AWS Lambda function.

In this application, we are connecting the in-application stream to a Kinesis Data Firehose delivery stream.

In the Destination Tab, choose in-application stream as anomaly_data_stream and Firehose stream as anomaly-delivery-stream and select the format as JSON. In this way configure for output_data_stream as well.
You can see the following after configuring:

Data writes into S3 using Kinesis Firehose Delivery Stream. Now we can query the data in Athena by running a Crawler once on that path.

Thanks for the read. Hope it was helpful.

This story is authored by PV Subbareddy. Subbareddy is a Big Data Engineer specializing on Cloud Big Data Services and Apache Spark Ecosystem.

Object Detection in React Native App using AWS Rekognition

In this post, we are going to build a React Native app for detecting objects from an image using Amazon Rekognition.

Here we will capture an Image or Select it from file system. We will send that image to API Gateway where it triggers the Lambda Function which will store in S3 Bucket. The stored image is sent to Amazon Recognition which will detect the objects from the image.

Installing dependencies:

Let’s go to React Native Docs, select React Native CLI Quickstart and select our appropriate Development OS and the Target OS as Android, as we are going to build an android application.

Follow the docs for installing dependencies, then create a new React Native Application. Use the command line interface to generate a new React Native project called ObjectDetection.

react-native init ObjectDetection

Preparing the Android device:

We shall need an Android device to run our React Native Android app. This can be either a physical Android device, or more commonly, we can use an Android Virtual Device (AVD) which allows us to emulate an Android device on our computer (using Android Studio).

Either way, we shall need to prepare the device to run Android apps for development. If you have a physical Android device, you can use it for development in place of an AVD by connecting it to your computer using a USB cable and following the instructions here.

If you are using a virtual device follow this link. I shall be using a physical Android device.

Now go to the command line and run react-native run-android inside your React Native app directory

cd ObjectDetection && react-native run-android

If everything is set up correctly, you should see your new app running on your physical device or Android emulator.

API Creation in AWS Console: 

Before going further, create an API in your AWS console following this link.
Once your done with creating API come back to the React Native application.
Now, go to your project directory and Replace your App.js file with the following code.

import React, {Component} from 'react';
import {
 StyleSheet,
 View,
 Text,
 TextInput,
 Image,
 ScrollView,
 TouchableHighlight,
} from 'react-native';
import ImagePicker from 'react-native-image-picker';
import Amplify, {API} from 'aws-amplify';
import Video from 'react-native-video';
 
// Amplify configuration for API-Gateway
Amplify.configure({
 API: {
   endpoints: [
     {
       name: 'LabellingAPI',   //your api name
       endpoint:’<Endpoint-URL>’, //Your Endpoint URL
     },
   ],
 },
});
 
class Registration extends Component {
 constructor(props) {
   super(props);
   this.state = {
     username: 'storeImage.png',
     userId: '',
     image: '',
     capturedImage: '',
     objectName: '',
   };
 }
 
// It selects image from filesystem or capture from camera
 captureImageButtonHandler = () => {
   this.setState({
     objectName: '',
   });
 
   ImagePicker.showImagePicker(
     {title: 'Pick an Image', maxWidth: 800, maxHeight: 600},
     response => {
       console.log('Response = ', response);
       if (response.didCancel) {
         console.log('User cancelled image picker');
       } else if (response.error) {
         console.log('ImagePicker Error: ', response.error);
       } else if (response.customButton) {
         console.log('User tapped custom button: ', response.customButton);
       } else {
         // You can also display the image using data:
         const source = {uri: 'data:image/jpeg;base64,' + response.data};
         this.setState({
           capturedImage: response.uri,
           base64String: source.uri,
         });
       }
     },
   );
 };
 
// this method triggers when you click submit. If the image is valid then It will send the image to API Gateway. 
 submitButtonHandler = () => {
   if (
     this.state.capturedImage == '' ||
     this.state.capturedImage == undefined ||
     this.state.capturedImage == null
   ) {
     alert('Please Capture the Image');
   } else {
     const apiName = 'LabellingAPI';
     const path = '/storeimage';
     const init = {
       headers: {
         Accept: 'application/json',
         'Content-Type': 'application/x-amz-json-1.1',
       },
       body: JSON.stringify({
         Image: this.state.base64String,
         name: 'storeImage.png',
       }),
     };
 
     API.post(apiName, path, init).then(response => {
       if (JSON.stringify(response.Labels.length) > 0) {
         this.setState({
           objectName: response.Labels[0].Name,
         });
       } else {
         alert('Please Try Again.');
       }
     });
   }
 };
 
 render() {
   if (this.state.image !== '') {
   }
   return (
     <View style={styles.MainContainer}>
       <ScrollView>
         <Text
           style={{
             fontSize: 20,
             color: '#000',
             textAlign: 'center',
             marginBottom: 15,
             marginTop: 10,
           }}>
           Capture Image
         </Text>
         {this.state.capturedImage !== '' && (
           <View style={styles.imageholder}>
             <Image
               source={{uri: this.state.capturedImage}}
               style={styles.previewImage}
             />
           </View>
         )}
         {this.state.objectName ? (
           <TextInput
             underlineColorAndroid="transparent"
             style={styles.TextInputStyleClass}
             value={this.state.objectName}
           />
         ) : null}
         <TouchableHighlight
           style={[styles.buttonContainer, styles.captureButton]}
           onPress={this.captureImageButtonHandler}>
           <Text style={styles.buttonText}>Capture Image</Text>
         </TouchableHighlight>
 
         <TouchableHighlight
           style={[styles.buttonContainer, styles.submitButton]}
           onPress={this.submitButtonHandler}>
           <Text style={styles.buttonText}>Submit</Text>
         </TouchableHighlight>
       </ScrollView>
     </View>
   );
 }
}
 
const styles = StyleSheet.create({
 TextInputStyleClass: {
   textAlign: 'center',
   marginBottom: 7,
   height: 40,
   borderWidth: 1,
   marginLeft: 90,
   width: '50%',
   justifyContent: 'center',
   borderColor: '#D0D0D0',
   borderRadius: 5,
 },
 inputContainer: {
   borderBottomColor: '#F5FCFF',
   backgroundColor: '#FFFFFF',
   borderRadius: 30,
   borderBottomWidth: 1,
   width: 300,
   height: 45,
   marginBottom: 20,
   flexDirection: 'row',
   alignItems: 'center',
 },
 buttonContainer: {
   height: 45,
   flexDirection: 'row',
   alignItems: 'center',
   justifyContent: 'center',
   marginBottom: 20,
   width: '80%',
   borderRadius: 30,
   marginTop: 20,
   marginLeft: 5,
 },
 captureButton: {
   backgroundColor: '#337ab7',
   width: 350,
 },
 buttonText: {
   color: 'white',
   fontWeight: 'bold',
 },
 horizontal: {
   flexDirection: 'row',
   justifyContent: 'space-around',
   padding: 10,
 },
 submitButton: {
   backgroundColor: '#C0C0C0',
   width: 350,
   marginTop: 5,
 },
 imageholder: {
   borderWidth: 1,
   borderColor: 'grey',
   backgroundColor: '#eee',
   width: '50%',
   height: 150,
   marginTop: 10,
   marginLeft: 90,
   flexDirection: 'row',
   alignItems: 'center',
 },
 previewImage: {
   width: '100%',
   height: '100%',
 },
});
 
export default Registration;

In the above code, we are configuring amplify with the API name and Endpoint URL that you created as shown below.

Amplify.configure({
 API: {
   endpoints: [
     {
       name: '<Your-API-Name>, 
       endpoint:’<Endpoint-URL>’,
     },
   ],
 },
});

By clicking the capture button it will trigger the captureImageButtonHandler function. It will then ask the user to take a picture or select from file system. When user captures the image or selects from file system, we will store that image in the state as shown below.

captureImageButtonHandler = () => {
   this.setState({
     objectName: '',
   });
 
   ImagePicker.showImagePicker(
     {title: 'Pick an Image', maxWidth: 800, maxHeight: 600},
     response => {
       console.log('Response = ', response);
       if (response.didCancel) {
         console.log('User cancelled image picker');
       } else if (response.error) {
         console.log('ImagePicker Error: ', response.error);
       } else if (response.customButton) {
         console.log('User tapped custom button: ', response.customButton);
       } else {
         // You can also display the image using data:
         const source = {uri: 'data:image/jpeg;base64,' + response.data};
         this.setState({
           capturedImage: response.uri,
           base64String: source.uri,
         });
       }
     },
   );
 };

After capturing the image we will preview that image. By Clicking on submit button, submitButtonHandler function will get triggered where we will send the image to the end point as shown below.

submitButtonHandler = () => {
   if (
     this.state.capturedImage == '' ||
     this.state.capturedImage == undefined ||
     this.state.capturedImage == null
   ) {
     alert('Please Capture the Image');
   } else {
     const apiName = 'LabellingAPI';
     const path = '/storeimage';
     const init = {
       headers: {
         Accept: 'application/json',
         'Content-Type': 'application/x-amz-json-1.1',
       },
       body: JSON.stringify({
         Image: this.state.base64String,
         name: 'storeImage.png',
       }),
     };
 
     API.post(apiName, path, init).then(response => {
       if (JSON.stringify(response.Labels.length) > 0) {
         this.setState({
           objectName: response.Labels[0].Name,
         });
       } else {
         alert('Please Try Again.');
       }
     });
   }
 };

Lambda Function:

Add the following code into your lambda function that you created in your AWS Console.

const AWS = require('aws-sdk')
var rekognition = new AWS.Rekognition()
var s3Bucket = new AWS.S3( { params: {Bucket: "<Your-Bucket>"} } );
var fs = require('fs');
exports.handler = (event, context, callback) => {
   let parsedData = JSON.parse(event)
   let encodedImage = parsedData.Image;
   var filePath = parsedData.name;
   let buf = new Buffer(encodedImage.replace(/^data:image\/\w+;base64,/, ""),'base64')
   var data = {
       Key: filePath,
       Body: buf,
       ContentEncoding: 'base64',
       ContentType: 'image/jpeg'
   };
   s3Bucket.putObject(data, function(err, data){
       if (err) {
           console.log('Error uploading data: ', data);
           callback(err, null);
       } else {
           var params = {
             Image: {
              S3Object: {
               Bucket: "<Your-Bucket>",
               Name: filePath
              }
             },
             MaxLabels: 10,
             MinConfidence: 90
            };
           rekognition.detectLabels(params, function(err, data) {
               if (err){
                   console.log(err, err.stack);
                   callback(err)
               }
               else{
                   console.log(data);
                   callback(null, data);
               }
           });
       }
   });
};

In the above code, we would receive the image from React Native which we are storing in S3 Bucket. The stored image is sent to Amazon Recognition which has detectLabels method that detects the labels from the image and sends the response with the detected labels in JSON format.

capture image screen

Once you capture an image you can see a preview of that image as shown below.

Nike backpack

On submitting the captured image you can see the label of that image as shown below:

Object recognised as backpack

That’s all folks! I hope it was helpful.
For any queries drop them in the comments section.

This story is authored by Dheeraj Kumar and Venu Vaka. Dheeraj is a software engineer specializing in React Native and React based frontend development. Venu is a software engineer specializing in ReactJS and AWS Cloud.

Optimizing QuickSight using Athena Queries and SPICE: Operating cost analysis

In this post, I will be discussing as an example how an automobile manufacturing company could utilize QuickSight to analyze their sales data and make better decisions. We will also learn how to best optimize the QuickSight operational cost structure by using SPICE engine to ingest source data at certain recurring intervals from Athena queries. This has two major advantages : dashboards and analyses load quickly as the data source is within SPICE. Secondly, cost of data ingestion is also brought down as Athena is queried only to refresh the data load in SPICE.

We will look at a sales dashboard, created using data-sets prepared from data in refined zone in a DataLake created using LakeFormation. A Data engineering pipeline writes data to this refined zone with year and month partitions every hour.

In case you wish to build a similar thing and follow along, below is the link to raw datasets:

https://github.com/koushik-bitzop/data-sets/tree/master/sales2016-2018

Creating a SPICE based Athena Data-set:

Select Athena as the data set source:

Select use custom query.

Select Edit/Preview data and then choose data source as SPICE and click on Finish.

Once query successfully ran and you could see the data, click on the Save and Visualise.

In case you want to add any calculated fields or change data types you could do that in the red highlighted section shown above.

I have discussed in detail here in my previous articles Visualizing Multiple Datasets in AWS QuickSight and Adding User-Interactivity to AWS QuickSight Dashboards

Refresh Schedule for Data-sets:

Depending on how frequent new data is arrived you could schedule the refresh. For every refresh an Athena query is executed and the results are imported into SPICE.

Note: 

  1. In this example, Quicksight SPICE pull data refresh is whole data, not incremental.
  2. It is not possible to pass quicksight pass pushdown predicates (variables) from filters in dashboard to Athena. So if you want to look at a rolling window of data such as past 24 hours or past one month or past 6 months, we can use a WHERE clause in the Athena source query to fetch just those records. Also, if the data is partitioned by year and month, only required data is scanned thereby further saving on costs.

A lowdown on QuickSight Operating cost with this architecture:

We are looking at two main cost components:

  1. Athena – S3 data scan costs
  2. QuickSight Infrastructure costs

Athena – S3 data scan costs:

Athena pricing for successful queries:
1TB scan = 5$
S3 storage cost not included.

No. of queriesData scanned in S3Scheduled RefreshTotal Data scanned
(monthly)
Bill estimated
(monthly)
Bill estimated
(annual)
1150 to 210 KBHourly1*24*30*210KB = 0.0001512TB0.000756$0.009072$

Above numbers are a bit low to make an inference. Let us say, you have 4 such queries (each query is scanning around 150 to 200 MB) powering the dashboard and SPICE ingests this data once every hour.

No. of queriesData scanned in S3Scheduled RefreshTotal Data scanned
(monthly)
Bill estimated
(monthly)
Bill estimated
(annual)
4150 to 200 MBHourly4*24*30*300MB = 576GB or 0.57TB$2.88$34.56

In case, we do not use SPICE to load this data from Athena in an hourly fashion and instead use Athena query as the direct source, then cost of the dashboards would increase proportionately with each query. So as an example, if the dashboards are being viewed at a rate of 1000 views per hour (and each dashboard has 4 source queries), then the cost above would be multiplied by a staggering 1000 times! and the annual bill would be an eye popping $ 34,560.

QuickSight infrastructure cost (Standard Edition):

No charge for readers. $9 for Author with annual subscription.

User typeNo. of usersBill estimated
(monthly)
Bill estimated
(annual)
Author1$9$108
Reader3$0$0
Total
$9 pm$108 pa

Note: For Enterprise edition, Readers are billed $0.30 for a 30-minute session up to a maximum charge of $5/reader/month for unlimited use. Authors are billed $18 with annual subscription.
For SPICE additional capacity $0.25/GB/standard and $0.38/GB/enterprise. 

So overall we can see that using SPICE with a periodic data refresh causes the costs to be optimized in a smart way. That’s it folks. I hope it was helpful. For any queries, drop them in the comments section.

This story is authored by Koushik. Koushik is a software engineer and a keen data science and machine learning enthusiast.