Migrating Existing WordPress Assets to Amazon S3

May 22, 2015Ryan Burnette

I've been using the Amazon S3 and CloudFront WordPress plugin to keep WordPress assets in an S3 bucket and distributed by CloudFront. At first it was an experiment but it quickly became clear that it was a superior way to deal with WordPress assets.

I had been using Amazon S3 and CloudFront for a while before I noticed that it was produced by the folks at Delicious Brains. They're the makers of WP Migrate DB Pro, a fabulous plugin that saves me at least an hour or two per week migrating WordPress databases. If you haven't already. Check it out.

The Problem

I ran into a situation where I migrated an existing site over to using Amazon S3 and CloudFront for assets, but I had a lot of existing assets that I wanted to serve out of S3/CloudFront to get them off the web server.

Potential Solutions

I considered a few potential solutions.

First, I thought about rewriting the locations of the existing assets in the WordPress database. This is a legitimate way to go, but the site in question had a huge database and something told me I could think of a simpler way.

Next, I thought about using redirects. On the downside, it would mean that each asset would still result in an HTTP request that would have to be responded to with some headers resulting in a 301 or 302 redirect, but it gave me a good way to reverse the action.

Another consideration I had was preexisting references to resources. There was a lot of traffic hitting the images' previous URLs and I needed to keep serving them. So it looked like either way the redirects would be necessary.

Lastly, I found out in this post that a solution to this problem would be available in the pro version, but I needed to take action now.

The Solution I Chose

I decided to go with the redirects. I could just as easily have done both the rewriting of locations and the redirects, but I felt that I could live with the extra redirects causing a bit of overhead in page loading. With the amount of traffic the site in question was getting, it was likely to still lead to a net gain in performance.

Getting the Assets Into S3

In this example a lot of assets are on the local web server and we're going to be syncing them over to an S3 bucket. the AWS Command Line Interface is perfect for getting this done. I found using pip to be the easiest way to get the CLI tools installed in my development environment. Installation instructions are available from the AWS documentation.

The web server I was working with was running Ubuntu 14.04.1 with the usual stack: Nginx, PHP5 and MySQL. The assets had been building up in the uploads directory since 2008 or so. There were over 20 gigabytes to upload. The most expeditious way to migrate the assets was to use the AWS Command Line Interface to move the assets right from the web server into the S3 bucket.

There's a lot of detail I could get into about setting up AWS in the console, creating API keys and setting up the S3 bucket, but I won't here. AWS is well documented. If you haven't already, that's the place to start.

I used pip to install the AWS CLI. Once that was done, I used aws s3 sync to sparse copy the assets from the web server right into S3. This is a ton faster than downloading the assets to your local machine then uploading them back into a bucket.

My sync command looked something like this. Yours may vary depending upon URL structure. Note that it's critical to set the ACL flag for public viewing. Missing this step is a pain. It's not as easy to set the public-read flag on existing assets.

I ran several tests of just a few files to make sure the command worked the way I expected. It didn't always do what I expected it to. I use rsync like second nature so my expectations may have been off.

cd /srv/mysite.com/wordpress
aws s3 sync wp-content/uploads s3://my-bucket/wp-content/uploads --acl public-read

Nginx Redirects

Now for a disclaimer, I use Nginx. The configurations I'll reference below are specific to Nginx. Someone who is familiar with Apache could probably adapt them for that web server if they were interested to do so.

The old location for any given asset looked something like this:


My S3 bucket was configured to distribute through CloudFront, so the new location looked something like this:


Next I needed a redirect pattern that would first check for the file locally, then redirect over to CDN if the local file wasn't present.

Nginx has a really useful try_files directive that made this easy. First though, I created a location block that isolated requests for uploads. From there I passed two arguments to try_files, one to check for the local file, and one to a named location for missing files.

location ~ "^/wp-content/uploads/(.*)$" {
  try_files $uri @missing;
location @missing {
  rewrite "^/wp-content/uploads/(.*)$" "https://static.mysite.com/wp-content/uploads/$1" redirect;

Once that was in place, I first moved the old uploads out of the public directory to make sure everything was working as expected, then eventually deleted the assets off the web server once I was satisfied.

Final Thoughts

Once that was done I had all the old assets being loaded from the CDN via redirect, and the Amazon S3 and CloudFront plugin was taking care of all the assets moving forward. It's a great setup for performance and reliability.

It's also exciting because having the assets hosting on their own platform is one step in the direction of having an elastic environment for WordPress hosting, but that's a story for another day.

<img src="http://t.deliciousbrains.com/aff_i?offer_id=2&aff_id=1290"
     class="impression-pixel" />
Blog Index