Modifying User-Agent based on Bot or Human for CloudFront via Lambda@Edge

The problem

We are currently using AWS Cloudfront to front our backend .NET web application (the origin). A new requirement came through asking us to serve up customised content based on whether the originating request was from a bot or human.

By default, AWS Cloudfront does not forward the user request's User-Agent to the origin so what can we do? Well one (non-peformant) way is to simply modify our distribution behavior to whitelist the the User-Agent header but this puts a punch to our hit/miss ratio because different users with different browser versions and device types will generated a bunch of different user agents (e.g. Safari on MacOS, Opera on Smart TV, Chrome on iPad etc).

The solution

So how can we let our origin serve up customised content for bots yet stay (somewhat) performant? Lambda@Edge to the rescue! Lambda@Edge lets you run Lambda functions to customize content that CloudFront delivers by executing the functions at edge locations. You can modify the request/response during any of the following stages:

After CloudFront receives a request from a viewer (viewer request)
Before CloudFront forwards the request to the origin (origin request)
After CloudFront receives the response from the origin (origin response)
Before CloudFront forwards the response to the viewer (viewer response)

My proposed solution still requires us to whitelist the User-Agent header but additionally modify the viewer request by setting the User-Agent to Human if it doesn't match any whitelisted bot. If a whitelisted bot is matched then keep the bot's User-agent and forward it to our origin for further content customisation. This way, real humans will always have a User-Agent of Human and not the device/browser/version variant (e.g. Safari on MacOS, Opera on Smart TV, Chrome on iPad etc) while being able to identify the User-Agent of the whitelisted bot at the same time. I've created the the Lambda function below which is deployed to my Cloudfront Distribution.

Step 1

Whitelist the User-Agent header in our distributions behavior over on our AWS console (or AWS CLI)

CloudFront Distributions > E2VZ0ISAMPLEDST > Behaviors > Edit > Whitelist Headers > Enter a custom header "User-Agent" > Add Custom > Yes, Edit

Step 2

Create and deploy our Lambda function. If you are new to Lambda functions like I was, simply follow their tutorial to creating a simple Lambda@Edge function. Once you are comfortable creating a Lambda function, you can create your own. Here is mine.

Lambda Function. Triggered when CloudFront receives a request from a viewer (viewer request)

'use strict';
 
exports.handler = (event, context, callback) => {
    const request = event.Records[0].cf.request;
    const headers = request.headers;
    const uri = request['uri'];
    const botPattern = "Googlebot\\/|Googlebot-Mobile|Googlebot-Image|Googlebot-News|Googlebot-Video|bingbot";
    var re = new RegExp(botPattern, 'i');
    var userAgent = headers['user-agent'][0]['value'];

    if (re.test(userAgent)) {
        headers['user-agent'] = [{key: 'User-Agent', value: userAgent}];
    } else {
        headers['user-agent'] = [{key: 'User-Agent', value: 'Human'}];
    }

    callback(null, request);
};

Now obviously the botPattern isn't an exhaustive list because it's my personal whitelist. You can implement your own logic however you like ;)

If anyone has suggestions to improve it, please leave your comments below!

References

https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-how-to-choose-event.html