PR previews for your static site with CodeBuild and Lambda@Edge
January 01, 2019
Static site generators are awesome (particularly Gatsby), and they work really well with continuous deployment. Push to your git repo, your CI tests are run, your site builds, then it goes live. What’s not to like? Even better is when you can automatically build previews of all of your pull requests. Netlify does this, and it’s seriously cool. My blog is hosted with Netlify, and I’m a big fan. However, currently I’m working on a large Gatsby site with lots of images that need processing, and unfortunately I’ve run up against Netlify’s 15 minute build limit. This prompted me to look again at CodeBuild, which is the AWS build service. It can do a pretty good job of continuously deploying a static site from GitHub to S3 right out of the box. What it doesn’t do is build deploy previews. This struck me as a perfect use for Lambda@Edge, which lets you customise requests and responses in CloudFront.
The approach I’m going to take is:
- Set up CodeBuild to build GitHub pull requests
- Push these to S3 in directories that correspond to the pull request name
- Deploy these to CloudFront with a wildcard subdomain
- Use Lambda@Edge to emulate virtual hosts, with subdomains matching the pull request name
Setting up CodeBuild
The first part is quite simple. Go to CodeBuild and create a new project, with GitHub as the source. Set it to build automatically on push. Once you’ve linked it with GitHub, it will install a webhook to trigger builds. By default this will be called on every push, which isn’t what we want. If you go into the repository settings in GitHub, in the webhook section uncheck everything except pull requests.
You’ll need to choose a few settings for the build environment. I’m using the Ubuntu image with Node 8, which is what we want for building Gatsby. There are other images that you can choose from, or you can even specify your own Docker container. Be sure to enable caching, which we’ll set up below.
CodeBuild uses the file buildspec.yml
to define how to build the project. In this file you can define the commands used at each stage of the build process. Our setup here is quite simple, but it’s very flexible.
I’m using yarn for this project, which isn’t included in the CodeBuild images, but it’s easy enough to install. I then run yarn build
to build the site. The reason we got into this is because on my site this initially takes a long time – about 18 minutes on the default instance. However we’ve enabled caching, so subsequent builds are quick.
By default, CodeBuild can deploy to S3. However this isn’t as flexible as we need. For a start, it’s run whether or not the build fails, which isn’t what we want. It also runs right at the end of the process, after the build cache is uploaded which takes ages, so you can be stuck around for ten minuteds waiting for a 1 minute build to deploy. By running it yourself in the build phase, you can quickly sync before the slower cache upload. You also have full control over the process.
For the deploy preview, we want to upload the build into a directory that matches the name of the pull request. Luckily we have access to this in the
$CODEBUILD_WEBHOOK_TRIGGER
environment variable. If you look at older docs (or previous versions of this post), you may see this as
$CODEBUILD_SOURCE_VERSION
, but that now always contains the commit hash. For a pull request, the variable will be in the form “pr/22”. We can’t use that
directly as a bucket key, because we need it to be a valid subdomain. Luckily we can use shell commands inside the yaml, so we can use tr
to replace the /
with -
:
echo $CODEBUILD_WEBHOOK_TRIGGER | tr / -
The command we’re using to push the build to S3 is aws sync
. The --delete
flag means any local files that are deleted will also be deleted from the bucket.
# buildspec.yml
version: 0.2
phases:
install:
commands:
- npm i -g yarn
- yarn
build:
commands:
- yarn build && aws s3 sync --delete public "s3://your-example-bucket/previews/$(echo $CODEBUILD_WEBHOOK_TRIGGER | tr / -)"
cache:
paths:
- node_modules/**/*
- public/**/*
- /usr/local/lib/node_modules/**/*
If you deploy this now it will build, but won’t be able to push to the bucket. You’ll need to go into IAM and set up permissions.
CloudFront
You can run the build now and have it deploy to S3. You can serve the site directly from S3 using the website hosting features, but that’s not great. If you want to use the website features (which you do, as you need to specify index files) then you can’t use HTTPS. That’s a dealbreaker in 2018. You also can’t serve reponses with compression. Basically, whenever you serve a site from S3, you should use CloudFront. Luckily it’s pretty simple to set up CloudFront with an S3 backend.
To set up HTTPS, you need to verify that you control the domain. This is super simple if the zone is hosted in Route 53: choose DNS verification, then click on verify, and there will be a button that adds the record to Route 53 for you.
Lambda@Edge
This is a great new service that lets you run Lambda functions on the CloudFront edge servers. When I first heard of this I couldn’t really think of a use for it, but this is a perfect example. We’re going to use the subdomain to rewrite the requests, mapping them to the directories that hold our build previews. The code is pretty simple. We extract the subdomain from the request’s Host header, then rewrite the request URI. We’ll also add a trailing slash to requests for directories to avoid a redirect.
"use strict";
/*
* This extracts the subdomain from the request host, and rewrites the request uri
* to use it as a folder. e.g https://pr-4.example.com/under_construction.gif is rewritten
* to https://pr-4.example.com/preview/pr-4/under_construction.gif
*/
exports.handler = (event, context, callback) => {
const { request } = event.Records[0].cf;
const { host, accept } = request.headers;
if (
accept &&
accept.length &&
accept[0].value.includes("/html") &&
!(request.uri.endsWith("/") || request.uri.endsWith(".html"))
) {
request.uri += "/";
}
if (host && host.length) {
// Destructured assignment extracts the first item in an array
const [subdomain] = host[0].value.split(".");
if (subdomain) {
request.uri = `/public/${subdomain}${request.uri}`;
return callback(null, request);
}
}
callback("Missing Host header");
};
Create a new Lambda with the Node.JS 8 and paste in the code above. Save it, then choose Actions and Publish. CloudFront requires that you use a published version of a Lambda. Any changes that you make will need to be published, and the Lambda updating in CloudFront. Once this is done, go to the list of triggers on the left and choose CloudFront. In the configuration section, choose the CloudFront distribution that you created before, and then “viewer request”.
There are four stages where you can run a Lambda@Edge function. Viewer request, which is when the request arrives at CloudFront; Origin Request, which is just before CloudFront makes the request to the origin server; Origin Response, which is when the response arrives from the origin server; and Viewer Response, which is just before the response is sent to the viewer. We need to run at the Viewer Request stage, because we need access to the request Host header.
Fixing the cache headers
These instructions are specific to Gatsby, but you’ll need to do something similar for your static site generator. By default, S3 doesn’t add any cache control headers to its reponses. This is a bad thing, as it means the browser won’t cache anything, losing one of the great benefits of a static site. You can asdd the cache-control attributes when you upload to S3, but as we’re already using Lambda@Edge, we can more flexible add them here instead. This time you need to add them at the Origin Response stage, where the reponse from S3 arrives at CloudFront bereft of cache headers.
With Gatsby, it’s recommended that you don’t cache HTML files, but because all other assets use filename hashes they should be cached forever. We can add the appropriate headers on the fly.
We’re also rewriting any redirects, removing the directories that we added before.
Create a new Lambda with the following content:
"use strict";
exports.handler = (event, context, callback) => {
const { response } = event.Records[0].cf;
if (response.status === 302) {
const { location } = response.headers;
if (location && location.length) {
// Strip the leading 2 directories from redirects
response.headers.location[0].value = location.replace(
/\/([^\/]+)\/([^\/]+)(\/.*)/,
"$3"
);
}
}
const type = response.headers["content-type"];
let cache;
if (type && type.length && type[0].value === "text/html") {
cache = "public,max-age=0,must-revalidate";
} else {
cache = "public,max-age=31536000,immutable";
}
response.headers["cache-control"] = [{ key: "Cache-Control", value: cache }];
callback(null, response);
};
Fixing the mime types
By default, the AWS cli tries to guess the mimetype of uploaded files. This is good, because S3 doesn’t use filename rules when serving content. Unfortunately the Python library that it uses reads the OS’s mime.types file, and the images used by CodeBuild have a seriously outdated version, missing such obscure types as .json. Fortunately, this is quite easy to fix: just replace the default mime.types file with the latest verison from Apache. We can add this as a command in our buildspec:
# buildspec.yml
version: 0.2
phases:
install:
commands:
- curl -s -o mime.types "https://svn.apache.org/viewvc/httpd/httpd/trunk/docs/conf/mime.types?view=co"
- sudo mv mime.types /etc/
- npm i -g yarn
- yarn
build:
commands:
- yarn build && aws s3 sync --delete public "s3://your-example-bucket/previews/$(echo $CODEBUILD_WEBHOOK_TRIGGER | tr / -)"
cache:
paths:
- node_modules/**/*
- public/**/*
- /usr/local/lib/node_modules/**/*
Now when you sync your files, even your .woff2, .webp and .json files will be served with the correct mimetypes, meaning they will be compressed correctly.
You should now be able to open a pull request, or push to an existing one, and have CodeBuild build and deploy the preview to S3 and CloudFront.
Originally posted to mk.gg. The next part will be about deploying the live site when you merge to master.