Serverless Compute to Measure End-User Experience with AWS Lambda

In this post I will be explaining how to architect and implement a solution to actively measure Qlik Sense End-User Experience at scale and from different geographical locations by leveraging AWS Lambda function.

Some background

For many years I’ve been engaged with many clients to deliver performance testing and user experience monitoring solutions using Selenium and web-drivers. The challenge was always around “where should I put my script? and how can I scale browser windows without crashing the test hosting client?” In many cases we ended up orchestrating the script to run concurrently on smaller test hosting clients but with limited number of browser windows. This was a good solution overall, but it did require an extensive planning and prepping as well as complex operational management.

In a typical setup and in order to run a load testing or web scraping service, the following dependences are required

  • Selenium

With AWS Lambda things started to change and more scalable solutions came to reality. After some research we found out some projects using AWS Lambda functions to run Headless Chrome.

Think of Lambda as the isolated container that most contains all dependences to do a certain job. So, the question is how can we compile a Lambda function with all required dependences? Thanks to serverless-chrome and pychromeless, we can actually include a small size headless chrome binary and push a compiled version of the service to AWS as a complete function.

The real need

As part of the services we deliver, End-User Experience monitoring has become a critical and a “must-have” service that most of our clients are asking for it. Application owners want to make sure their apps are accessible at any time within an acceptable response time standard AND from any geographical location.

The approach, and why used AWS Lambda?

Lambda lets you write or upload a script that runs according to various triggers, it also can be deployed on all AWS Regions (at least all US regions) and within multiple Availability Zones.

To simulate users from different geo locations, we architected the solution by deploying at least 6 lambda functions within all US AWS regions, 2 functions in US East (N. Virginia), 2 functions in US East (Ohio), 1 function US West (N. California) and 1 function in US West (Oregon).

Each function contains the following

  • Chrome driver binary V2.41

We’ve designed to the function and the python script to be completely abstracted and dynamic to take advantage of AWS Lambda scaling ability and the high number of concurrent executions possible per function. Technically, we can deploy one Lambda function per region per AZ while targeting as many sites/clients/environments as required (up to 1000 concurrent).

But how can we direct the function to target a specific site? what DOM objects to look for? How can we pass username and password? etc.

The handler is the method in our Lambda function that processes events. When we invoke a function, the runtime runs the handler method (in our solution, we are using Python 3.7). When the handler exits or returns a response, it becomes available to handle another event. We are generating the event by using AWS EventBridge rules.

def lambda_handler(event, context):

return some_value

For every site/client/environment that we need to target, we simply create a unique EventBridge rule per region per AZ. For example, if we need to measure the response time for a given client from multiple geo locations, we need to create a set of EventBridge rules in all desired regions/ AZ.

The EventBridge rule must contain the following input in a JSON format (this is mostly for Qlik Sense sites with Windows Auth enabled)

  • ClientName

Once the time-based EventBridge rule is triggered, it will invoke the corresponding Lambda function and it will pass to it the configured input as a JSON text. EventBridge supports basic scheduling (e.g. every 10 minutes), as well as cron based scheduled for more complex scenarios.

Our Python code will first get the event as an object, convert to JSON, parse the JSON string and then store all clients’ specific info as variables such as client name and environment type etc.

def lambda_handler(event, context):

driver = webdriver.Chrome(chrome_options=chrome_options)

az= os.environ['availzone']

Note: we are using AWS SSM Parameters to store username and password information as encrypted strings for security reasons. Each Lambda function will dynamically pick up the AWS AZ that it’s running on and will pull the right information/credentials accordingly using boto3

# configure your ssm client here, such as AWS key or region
ssm = boto3.client('ssm')
session = boto3.session.Session()
param = f'/{client}/{environment}/{az}/UserID'
#print('user: ', param)
PSUsername = ssm.get_parameter(Name=param, WithDecryption=True)
username= PSUsername['Parameter']['Value']
param = f'/{client}/{environment}/{az}/password'
#print('pass: ', param)
PSPassword = ssm.get_parameter(Name=param, WithDecryption=True)
password= PSPassword['Parameter']['Value']

Once all required information is available, the code will continue its course and follow the below logic

  • Open up the Serverless Chromium browser binary stored within the function

Lambda functions outputs (mainly response time) are stored within AWS CloudWatch logs streams which will eventually be synced with Zabbix monitoring system. We’ve created a Grafana dashboard that pulls data out of Zabbix so response time data can be shared with our clients.

Related projects

Solutions Architect at Amazon Web Services