Static Site Deploy Previews on AWS

There are tons of options out there for deploying websites, but my default approach for years has been: dump it in an S3 bucket with CloudFront in front.

There's a big assumption and scope narrowing here: that the website needs to be basically static, perhaps connecting to external APIs for dynamic behavior when needed. For many low-write-volume websites, it doesn't take a ton of effort to make that assumption a good one. You might be saying "Hey! You're just describing Jamstack in a needlessly roundabout way!"... and I think you're right.

My experience with this kind of architecture has been that scalability concerns, along with many performance concerns, basically go away in this scenario. There can certainly be issues with site performance from the end user's perspective due to any number of JavaScript, image loading, or API connectivity issues—but in terms of backend server health, we just don't worry about it. Of course there are lots of projects for which this sort of architecture is more trouble than it's worth, but that's why this is a "default" instead of a "thing I always do."

After seeing a few projects with fancy tooling like Heroku's review apps and Netlify's deploy previews, I found myself disillusioned with local build/deploy cycles for trying out new features that teammates have built. I wanted to be able to try out those features straight from the Github PR review page: click a link, and right away I'm seeing a preview of the changes.

What would it take to wire up a feature like that ourselves? In this post, we'll take a look at the required bits and pieces that let us create static site deploy previews on AWS.

Caveats

I want to say up front that I don't think any of this approach is original: lots of other folks have blogged about similar approaches.

It's also not even my recommendation that you choose this mechanism for your own static site deployments: for many use cases, dedicated services like the ones above (and others!) will work great. So you don't actually have to do this yourself.

That said, we learn a lot by digging into the details of useful features, so if nothing else this has been a great learning process for me.

And depending on the tradeoffs and requirements in your unique situation, this might turn out to be a decent approach for you.

So, caveats aside, what does it take to set up deploy previews for static sites?

Requirements

The overall goal here is that when a PR is opened, we can see a preview site to evaluate before it goes live. And then when we merge the PR to the main branch, the site gets deployed to production.

In service of that bigger goal, we needed to make sure that:

When PR #13 comes along, it doesn't overwrite a preview for PR #12—i.e. multiple simultaneous preview sites are available.
When a successful deployment of a preview site is complete, we can see a link to a preview site on the PR.

We had an additional wrinkle in our setup, wiring up previews to be built when CMS changes were published. (We have a CMS, but we pull from it at build time, not as a runtime API.) We'll save the story here for another time, but suffice to say that once the right build triggers were in place for PRs, we didn't need massive effort to wire in the CMS.

Multiple simultaneous previews

One natural approach might be to deploy each new site preview into a totally new and distinct S3 bucket, and point CloudFront at that new bucket. We did something like this for a previous version of the site, without a "preview site" feature: each new deployment would go in a new bucket. We had a couple of annoyances with this approach.

First, there's an AWS limit on the number of S3 buckets per account. The number of buckets grows over time as new deploys are made, so after enough deploys, we'd run out of available buckets on the AWS account. We could contact AWS support and get that limit raised, but it didn't seem worth it. Regardless of the number, it's a limit we'd hit eventually and need to deal with. For that version of the site, our workaround was to manually delete old buckets once we hit the limit. This wasn't super-painful or super-frequent, but it was an annoyance we wanted to avoid.

Also, it took CloudFront like 20 minutes to update when we pointed it at the new S3 bucket. Since then, that wait time has gone way down, impressively so.

Those two issues led us down a path of exploration: could we always deploy to the same S3 bucket for this use case? To do that, what would need to be in place? An immediate idea here was to use "directory" prefixes, e.g. a preview version lives in a bucket under /previews/00d9efab663e1aa435936a3431821f7e7aa1d55e/. This was appealing because all preview sites go in the same bucket, but problematic because the way S3 static sites work means that our URLs when we visit the website would need that long, noisy path appended to them, and we don't want that noise in the URL.

Because we're using CloudFront in front of the bucket, we had a nifty solution available to us. We could use custom subdomains to access the preview site (e.g. SHA_HERE.preview.example.com), and use Lambda@Edge to rewrite incoming requests to move that SHA_HERE portion from the domain to the path the way S3 expects it. Something like this:


"use strict";

exports.handler = (event, context, callback) => {
		const { request } = event.Records[0].cf;
		const { host } = request.headers;

		if (host && host.length) {
				// require 4 segments, e.g. SHA_HERE.preview.example.com
				const [subdomain, _preview, _domain, tld] = host[0].value.split(".");
				if (subdomain && tld) {
						console.log("subdomain:", subdomain);
						request.uri = `/previews/${subdomain}${request.uri}`;
						return callback(null, request);
				} else {
						return callback("Missing subdomain");
				}
		}

		callback("Missing Host header");
}

And we can use wildcard DNS to point *.preview.example.com at our CloudFront distribution, allowing all current and future preview domains to point to the right place. Our Terraform for this looks something like:


resource "dnsimple_record" "preview_example_com" {
		domain = "example.com"
		name = "*.preview"
		type = "CNAME"
		value = module.s3_website_preview.cloudfront_domain_name
		ttl = "3600"
}

This means a request for 12345.preview.example.com gets rewritten and sent to the S3 bucket website as preview.example.com/previews/12345/. Note that this is not a redirect, which would round-trip back to the client with a 3xx-level response. Instead, the request comes into this Lambda@Edge function, and is passed along to the S3 bucket after that rewrite.

Lambda@Edge is kind of nifty—my mental model is effectively that it lets us run custom code on CloudFront. The pricing is higher than normal Lambda (as you'd expect), and the language choices are more limited, but it's a pretty neat tool. For this use case, we weren't too worried about this incrementally higher cost (about 3x higher than regular Lambda, as of today), because these preview sites get visited so rarely.

One gotcha with these Lambda@Edge functions is making sure you know when this code ought to be running—should it process the request or the response? Should it always execute, or only conditionally depending on whether or not the request is in CloudFront's cache? If you're going down this road, I definitely recommend reading up on the various CloudFront events that you'll need to pick for your function. We used "Viewer Request" for the one above, which is configurable in Terraform as part of the aws_cloudformation_distribution resource. The Lambda itself we also deployed with Terraform, which was straightforward after making sure that the runtime was allowed for Lambda@Edge. The Terraform for these is noisier than I think would be helpful to show in code here, but hopefully that gives you enough flavor for what's involved.

Redirects

There's one remaining problem with this URL-rewriting idea—have you spotted it already? What happens when the S3 website responds with a redirect? For example, we make a request for /previews/12345/contact, and S3 wants to redirect us to a similar URL with a trailing slash, /previews/12345/contact/?

Well, because CloudFront has done that rewrite to use a path instead of a subdomain, as far as S3 knows, the requester knows about these paths, so its redirect looks just like the above. The user in their browser, on the other hand, gets redirected from their original request, 12345.preview.example.com/contact, to 12345.preview.example.com/previews/12345/contact/, and when that request hits CloudFront, there's going to be pandemonium as it attempts to add /previews/12345 a second time.

We could make the Lambda function smarter and only modify the path if there's no /previews/SHA_HERE in it (and I think this was the first thing I tried); but then we're left with an issue where the user's browser has this /previews/12345 all of a sudden, and no indication of why it's there.

Instead, we can use another one of the CloudFront events, "Origin Response", and write another Lambda@Edge function to handle it:

"use strict";

exports.handler = (event, _context, callback) => {
		const { response } = event.Records[0].cf;
		if (response.status === "302" || response.status === "301") {
				const { location } = response.headers;

				if (location && location.length) {
						// Strip the leading 2 directories from redirects
						response.headers.location[0].value = location[0].value.replace(
								/\/([^\/]+)\/([^\/]+)(\/.*)/,
								"$3"
						);
				}
		}

		callback(null, response);
};

Great! Now redirects work just fine for our preview sites.

Tradeoffs

Because of this Lambda@Edge machinery, we've got a situation where our preview environments do look different than the production environment. The production site doesn't have the Lambda@Edge rewrites, and so perhaps we could have a bug that snuck in that only surfaces in the absence of such rewrites. I can't think of a way this could happen, but also can't rule it out. By using separate buckets, we could have ensured 100% sameness in "promoting" an S3 bucket from preview to production.

Alternatively, we could have used some configuration specific to our static site generator to prefix all of the site's URLs with /previews/SHA_HERE/. This wouldn't have been too bad to implement—likely some build-time script that took the code SHA and injected it into the site's configuration file somehow. That would've presented the tradeoff that, as we previewed the site, we'd be looking at different paths than the production site. I'm not convinced either of these options is better than the other, but having this rewriting live purely in infrastructure made me think there might be less that could go wrong when it came time to go to production.

I'm satisfied that these tradeoffs are worth it, but I think it's important to be clear-eyed about the downsides you're accepting with big tech choices.

Deploying to the preview bucket

There's not too much that's fancy about deploying to an S3 bucket. AWS's CLI tools, e.g. aws s3 sync, can do the work of transferring files quite capably.

But there's a caveat worth noting. Gary Bernhardt gave a great summary of a race condition during deployment (remember, web applications are distributed systems!):

The solution we went with is very similar to Gary's: use the CI provider's cache. Like Gary, I like the minimal infrastructure this adds (basically none). But I should point out that there's an edge case where, if the CI provider's cache goes missing (normally a totally fine thing to happen to a cache), some previously existing assets will go missing in the new deploy. For our use case this is totally fine, but I can imagine scenarios where it would be more annoying.

At any rate, after making sure to preserve pre-existing assets, we perform two AWS S3 sync operations for two different kinds of files. Our static site generator appends SHA hashes to many of the files it builds (e.g. CSS, JavaScript, and image assets) so that, if any of the content of those files were to be updated, the newly generated files would include a different hash as part of the filename at build time. And any files that reference the asset files (most notably HTML, but often CSS as well) are generated to include those hashes when referencing the assets.

In this way, we can pretty easily identify which files in the build are immutable, because they have this pattern in the filename. It's not foolproof, since someone could theoretically name a non-hashed file with this pattern, but it's correct by convention and works for our purposes.

The two S3 operations, then, are:

one for hashed assets—for these, we tune the Cache-Control header pretty high, via an S3 API option, since those files are effectively immutable
another for non-hashed files, such as index.html, which will have the same names from one deploy to the next

Something like this:

commands:
		deploy_hashed_assets:
				description: "Deploy hashed assets"
				parameters:
						to:
								type: string
				steps:
						- aws-s3/sync:
										arguments: |
												--acl public-read \
												--cache-control "max-age=31536000"
										from: hashed_assets/public/
										overwrite: true
										to: << parameters.to >>

		deploy_remaining_application_files:
				description: "Deploy remaining application files"
				parameters:
						to:
								type: string
				steps:
						- aws-s3/sync:
										arguments: |
												--acl public-read \
												--cache-control "max-age=0"
										from: public/
										to: << parameters.to >>

One gotcha here is that when deploying to the preview bucket, we need to deploy to a given prefix, the Git SHA of our codebase. In other words, we don't want to deploy to the top level, s3://preview.example.com, but instead to s3://preview.example.com/previews/${GIT_SHA_HERE}. And with all the Lambda@Edge rewrites in place from the previous section, that gives us a working site at https://${GIT_SHA_HERE}.preview.example.com.

Deploying on PR open / update

In order to automate any given feature, it's nice to be able to do the thing manually first. So once we have a script that can perform a deploy, then we can worry about triggering that thing.

In our case, we have our CI provider (in this case it's CircleCI) do that work. So CircleCI runs the deploys, using its aws-s3 orb, after running through most of our tests. (Some other tests need to wait until after the deployment, so we can run them against the deployed code.)

There's a bit of a gotcha here: we'd like to only deploy for pull requests, and we want preview deploys to go out when pull requests are opened.

The first part of that is easy: add a step like this to the CircleCI build config, before the aws-s3/sync step:

- run: |
				if [ "${CIRCLE_PULL_REQUEST}" = "" ]; then
								circleci step halt
				else
								true
				fi

This way, any given build we run on CircleCI that isn't connected to a pull request will stop before trying to deploy.

The second part is a bit trickier, surprisingly-to-me. Our experience is that CircleCI would kick off a build from our Github repo on git push, but that a new build wouldn't be triggered when a PR was opened. This means that if we push a branch up, our previous circleci step halt would prevent a deploy preview from going out. And then when we open a PR for that same branch, there's no rebuild step as far as CircleCI is concerned, because the push event has already been processed.

It might look like a good solution here to just go ahead and do deploy previews for every git push, but a sticking point there is that we want the PR to have a link displayed.

In order to solve all these issues, we've ended up with a bit of a roundabout solution that works reasonably well:

Pull request is opened, and we have Github fire a webhook off to a new Lambda that has API Gateway in front, since webhooks need an HTTP API.
That Lambda (after validating the Github webhook HMAC and ensuring that it's a pull request opened event) makes a POST request to trigger a CircleCI build.
CircleCI does the build, and since it now sees that the code does correspond to a pull request (CIRCLE_PULL_REQUEST), the deploy will go out.
Upon successful completion of a preview deploy, the CircleCI job leaves a comment on the pull request with a link to the preview site, something along these lines:


- run: |
				preview_url="https://${CIRCLE_SHA1}.preview.example.com"
				user_creds="${GH_COMMENTER_USERNAME}:${GH_COMMENTER_AUTH_TOKEN}"
				pr_url="$(echo $CIRCLE_PULL_REQUEST | sed -e 's/\/github.com/\/api.github.com\/repos/' -e 's/\/pull/\/issues/')/comments"
				comment_text="The latest deploy preview for this PR is now available at $preview_url"
				echo "pr_url: $pr_url"
				echo "comment_text: $comment_text"
				echo "{\"body\": \"$comment_text\"}" |
				curl -i --data @- --user $user_creds $pr_url

It'd be nice if this Lambda and API Gateway weren't required, because it feels like indirection that shouldn't be necessary. I'm sure there are ways around this, using other CI providers or perhaps a Github app. But regardless of the ugliness, this seems to work reasonably well for us. Besides, we ended up with another use case for which a webhook triggering a CircleCI build was useful: publishing deploys from our headless CMS. But that's a story for another day.

Known issues

This stuff works well enough for its context, but it's in no way perfect. As I've mentioned, there's a decent amount of complexity here that we had to work through, that in many situations would be better farmed out to a service that provides preview deploys out of the box.

In addition to the caveats I've already mentioned, we don't currently do any cleanup of old previews. Ideally we'd delete preview deployments at some point after the app has been live in production. Every time we have a new build, new preview deployments just keep getting added to S3. But ultimately, S3 space is cheap, and the proliferation of objects hasn't hurt us.

We could probably put lifecycle policies on this bucket to expire objects after a few months (just to be safe—ideally a PR doesn't stay open more than a couple days).

Or we could do something fancier with another Lambda that processes Github webhooks when PRs are merged or closed. There's some complexity here around what happens if a PR is closed, then re-opened, and what should happen—nothing unsolveable, but the work-to-payoff ratio for this option seems less appealing.

Summed up

Implementing our own version of preview deploys for a CloudFront / S3 website on AWS has been a great learning experience for me, and a super useful tool for our team.

I hope this has been useful if you're considering doing something similar, or even if you're just curious about how these kinds of tooling features can work. Don't forget there are plenty of great services out there that'll do all this work for you. And of course we're always happy to chat, so please feel free to reach out if you've got needs in these areas!