Serverless Compute to Measure End-User Experience with AWS Lambda

In this post I will be explaining how to architect and implement a solution to actively measure Qlik Sense End-User Experience at scale and from different geographical locations by leveraging AWS Lambda function.

Some background

For many years I’ve been engaged with many clients to deliver performance testing and user experience monitoring solutions using Selenium and web-drivers. The challenge was always around “where should I put my script? and how can I scale browser windows without crashing the test hosting client?” In many cases we ended up orchestrating the script to run concurrently on smaller test hosting clients but with limited number of browser windows. This was a good solution overall, but it did require an extensive planning and prepping as well as complex operational management.

In a typical setup and in order to run a load testing or web scraping service, the following dependences are required

  • Selenium
  • A Web browser like Chrome
  • A Web browser driver like ChromeDriver
  • Python or Node JS code
  • Hosting client with OS i.e. Windows or Linux

With AWS Lambda things started to change and more scalable solutions came to reality. After some research we found out some projects using AWS Lambda functions to run Headless Chrome.

Think of Lambda as the isolated container that most contains all dependences to do a certain job. So, the question is how can we compile a Lambda function with all required dependences? Thanks to serverless-chrome and pychromeless, we can actually include a small size headless chrome binary and push a compiled version of the service to AWS as a complete function.

The real need

As part of the services we deliver, End-User Experience monitoring has become a critical and a “must-have” service that most of our clients are asking for it. Application owners want to make sure their apps are accessible at any time within an acceptable response time standard AND from any geographical location.

The approach, and why used AWS Lambda?

Lambda lets you write or upload a script that runs according to various triggers, it also can be deployed on all AWS Regions (at least all US regions) and within multiple Availability Zones.

To simulate users from different geo locations, we architected the solution by deploying at least 6 lambda functions within all US AWS regions, 2 functions in US East (N. Virginia), 2 functions in US East (Ohio), 1 function US West (N. California) and 1 function in US West (Oregon).

Each function contains the following

  • Chrome driver binary V2.41
  • Headless-chromium binary v1.0.0–55
  • Selenium library V3.14
  • Python script with pre-defined code to scrap a website while recording the time to load different web pages like Qlik Sense Hub and a given App.

We’ve designed to the function and the python script to be completely abstracted and dynamic to take advantage of AWS Lambda scaling ability and the high number of concurrent executions possible per function. Technically, we can deploy one Lambda function per region per AZ while targeting as many sites/clients/environments as required (up to 1000 concurrent).

But how can we direct the function to target a specific site? what DOM objects to look for? How can we pass username and password? etc.

The handler is the method in our Lambda function that processes events. When we invoke a function, the runtime runs the handler method (in our solution, we are using Python 3.7). When the handler exits or returns a response, it becomes available to handle another event. We are generating the event by using AWS EventBridge rules.

For every site/client/environment that we need to target, we simply create a unique EventBridge rule per region per AZ. For example, if we need to measure the response time for a given client from multiple geo locations, we need to create a set of EventBridge rules in all desired regions/ AZ.

The EventBridge rule must contain the following input in a JSON format (this is mostly for Qlik Sense sites with Windows Auth enabled)

  • ClientName
  • Environment such as Prod or Dev
  • User Directory
  • Site or URL
  • App ID
  • DOM objects ID for username, password and the login button
  • Hub Target (what object are we waiting for in the hub)
  • App Target (what object are we waiting for in the app)

Once the time-based EventBridge rule is triggered, it will invoke the corresponding Lambda function and it will pass to it the configured input as a JSON text. EventBridge supports basic scheduling (e.g. every 10 minutes), as well as cron based scheduled for more complex scenarios.

Our Python code will first get the event as an object, convert to JSON, parse the JSON string and then store all clients’ specific info as variables such as client name and environment type etc.

Note: we are using AWS SSM Parameters to store username and password information as encrypted strings for security reasons. Each Lambda function will dynamically pick up the AWS AZ that it’s running on and will pull the right information/credentials accordingly using boto3

Once all required information is available, the code will continue its course and follow the below logic

  • Open up the Serverless Chromium browser binary stored within the function
  • Navigate to the HUB URL
  • Find Username and Password DOM objects and send credentials to authenticate the user
  • Record the time it takes to render the hub and log it to AWS CloudWatch
  • Navigate to the desired QS app
  • Record the time it takes to load the app (or at least one object to render) and then log it to AWS CloudWatch
  • Close the browser and terminate the function

Lambda functions outputs (mainly response time) are stored within AWS CloudWatch logs streams which will eventually be synced with Zabbix monitoring system. We’ve created a Grafana dashboard that pulls data out of Zabbix so response time data can be shared with our clients.

Related projects

Solutions Architect at Amazon Web Services