Optimizing QuickSight using Athena Queries and SPICE: Operating cost analysis

In this post, I will be discussing as an example how an automobile manufacturing company could utilize QuickSight to analyze their sales data and make better decisions. We will also learn how to best optimize the QuickSight operational cost structure by using SPICE engine to ingest source data at certain recurring intervals from Athena queries. This has two major advantages : dashboards and analyses load quickly as the data source is within SPICE. Secondly, cost of data ingestion is also brought down as Athena is queried only to refresh the data load in SPICE.

We will look at a sales dashboard, created using data-sets prepared from data in refined zone in a DataLake created using LakeFormation. A Data engineering pipeline writes data to this refined zone with year and month partitions every hour.

In case you wish to build a similar thing and follow along, below is the link to raw datasets:

https://github.com/koushik-bitzop/data-sets/tree/master/sales2016-2018

Creating a SPICE based Athena Data-set:

Select Athena as the data set source:

Select use custom query.

Select Edit/Preview data and then choose data source as SPICE and click on Finish.

Once query successfully ran and you could see the data, click on the Save and Visualise.

In case you want to add any calculated fields or change data types you could do that in the red highlighted section shown above.

I have discussed in detail here in my previous articles Visualizing Multiple Datasets in AWS QuickSight and Adding User-Interactivity to AWS QuickSight Dashboards

Refresh Schedule for Data-sets:

Depending on how frequent new data is arrived you could schedule the refresh. For every refresh an Athena query is executed and the results are imported into SPICE.

Note: 

  1. In this example, Quicksight SPICE pull data refresh is whole data, not incremental.
  2. It is not possible to pass quicksight pass pushdown predicates (variables) from filters in dashboard to Athena. So if you want to look at a rolling window of data such as past 24 hours or past one month or past 6 months, we can use a WHERE clause in the Athena source query to fetch just those records. Also, if the data is partitioned by year and month, only required data is scanned thereby further saving on costs.

A lowdown on QuickSight Operating cost with this architecture:

We are looking at two main cost components:

  1. Athena – S3 data scan costs
  2. QuickSight Infrastructure costs

Athena – S3 data scan costs:

Athena pricing for successful queries:
1TB scan = 5$
S3 storage cost not included.

No. of queriesData scanned in S3Scheduled RefreshTotal Data scanned
(monthly)
Bill estimated
(monthly)
Bill estimated
(annual)
1150 to 210 KBHourly1*24*30*210KB = 0.0001512TB0.000756$0.009072$

Above numbers are a bit low to make an inference. Let us say, you have 4 such queries (each query is scanning around 150 to 200 MB) powering the dashboard and SPICE ingests this data once every hour.

No. of queriesData scanned in S3Scheduled RefreshTotal Data scanned
(monthly)
Bill estimated
(monthly)
Bill estimated
(annual)
4150 to 200 MBHourly4*24*30*300MB = 576GB or 0.57TB$2.88$34.56

In case, we do not use SPICE to load this data from Athena in an hourly fashion and instead use Athena query as the direct source, then cost of the dashboards would increase proportionately with each query. So as an example, if the dashboards are being viewed at a rate of 1000 views per hour (and each dashboard has 4 source queries), then the cost above would be multiplied by a staggering 1000 times! and the annual bill would be an eye popping $ 34,560.

QuickSight infrastructure cost (Standard Edition):

No charge for readers. $9 for Author with annual subscription.

User typeNo. of usersBill estimated
(monthly)
Bill estimated
(annual)
Author1$9$108
Reader3$0$0
Total
$9 pm$108 pa

Note: For Enterprise edition, Readers are billed $0.30 for a 30-minute session up to a maximum charge of $5/reader/month for unlimited use. Authors are billed $18 with annual subscription.
For SPICE additional capacity $0.25/GB/standard and $0.38/GB/enterprise. 

So overall we can see that using SPICE with a periodic data refresh causes the costs to be optimized in a smart way. That’s it folks. I hope it was helpful. For any queries, drop them in the comments section.

This story is authored by Koushik Busim. Koushik is a software engineer and a keen data science and machine learning enthusiast.

AWS QuickSight Auto-Narratives to Highlight Insights using Natural Language Processing

Most often analyzing data sets to summarize their main characteristics, is done with visuals. Yet still one has to sift through the visuals, drilling down, comparing values, and rechecking ideas to extract a conclusion. But with QuickSight that is not the case, using its auto-narratives feature, one could extrapolate conclusion from the data analysis or highlight insights and state them plainly in a natural language as part of the analysis or report. However in day to day analysis, a balanced mix of plain statements and visuals is appreciated. One could use this feature to add a brief summary of the analysis or highlight important points.

In this blog post, I will be using Discovering Barcelona dashboard, created earlier for my previous articles Visualizing Multiple Datasets in AWS QuickSight and Adding User-Interactivity to AWS QuickSight Dashboards. We will look at how to add insights to QuickSight Dashboards, and use auto-narratives to give a brief about Accidents in Barcelona.

Let us have a look at what we are gonna build.

In the below picture, the green highlighted section is the Insights auto generated from the dataset by QuickSight. If you like these insights and want them as part of analysis, you could add them. This is shown in the red highlight.

Once an Insight is added to an analysis the content it holds is called a Narrative.

Adding a Custom Insight:

Let’s learn more about computations later, for now closing this Computation modal will add an empty insight to our analysis.

I also deleted the previous insight, so I could start from scratch with this new one. To customize the insight either click on Customize insight or from the drop down menu on the top right and choose Customize narrative. Make sure to add the fields required for the insight from the fields list. Select the insight visual and select the fields from Fields list. Once added you could see them in the Field wells bar highlighted in red at the top.

Computations are more like ready made templates, values coming from calculations done on the dataset, here’s a list of computations for you to explore. Parameters could also be used in the narrative logic. I have discussed what parameters are and how to use them here. Functions are the same as those we use to add calculated fields while editing data sets. Add computation.

The type of computation needed is chosen.

Once you apply the configuration, changes will be reflected in the analysis. Let us also add the Top ranked computations for month and district.

Once you add, they will be listed in the Computations section.

In the Computations section, the blue objects are variables that can be used in the narrative.

Now applying configurations would reflect in the analysis.

Similarly let’s add for the District also. First add the computation for the district and then configure the narrative.

Brief summary of the analysis using QuickSight Autonarratives

We successfully configured a custom narrative. 

One more cool thing about it is, filters linked with a control for a specific field can be added. I have a filter created earlier that applies to all visuals. Let us remove one district from using the control and see if it affects the Insight.

It affected, now you don’t see the Example district from both Visual, Insight, and also the stats have also changed!

That’s broadly about AWS QuickSight auto-narratives, I hope this was helpful. Please experiment, and do let me know if I missed something in the comment section.

This story is authored by Koushik Busim. Koushik is a software engineer and a keen data science and machine learning enthusiast.