OpCache and symlink-based deployments

PHP 5.5 introduced an alternative opcode cache, OpCache, which replaces the APC extension. It's great - for one thing it's only trying to be an opcode cache and not a key-value store as well, and it's apparently more stable and easier to maintain. It does however cause a hiccup if you use symlinks to facilitate zero-downtime deploys.

Background

You can use a symlink to avoid users getting a 500 error while you deploy a new version of your site. A deploy may be uploading newer files, installing Composer dependencies, generating assets - which aren't instantaenous. If a pag is requested mid-deploy, files may not be available and the user gets a 500. A way around this is to store multiple versions of your site, with a symlink pointing to the current version. You then tell your webserver to use that symlink. What makes this safe for users is that updating a symlink is an atomic operation - there's no point at which the symlink doesn't exist, thus the user can't receive a 500 error due to files being unavailable. To symlink atomically, use ln -sf target linked (which forces an existing link to be overwritten).

The reason this breaks when switching from APC to OpCache is the way the two systems cache files. APC uses filesystem inodes to cache files. This works as inodes refer to the symlink, rather than the actual file. So in the following example, APC remembers /current/index.php rather than /releases/1/index.php:

current -> /releases/1
releases/
    1/
        index.php
    2/
        index.php

OpCache however is not inode-based. OpCache will evaluate the symlinks to find the real file path before caching, so OpCache will remember /releases/1/index.php instead. This means that when you update your symlink, you'll find that APC switches over to using the newer version while OpCache carries on serving the old version.

Solutions

There's a few ways around this.

  1. Directly updating your webserver configuration
  2. Resetting the OpCache
  3. Restarting PHP-FPM

Of course you can modify your deploy script to update your VirtualHost with the new path - and if you're already using Puppet or Ansible for deployment that might actually be an easier option.

I haven't used Apache in a while, so I'm not entirely sure if this affects mod_php. Rasmus mentions that at Etsy they have a custom Apache module that calls realpath() on the document root. Presumably this has the effect of updating your VirtualHost and reloading webserver configuration.

However if you'd like to continue using symlinks you can bypass this issue by flushing the OpCache-cache. This might appear as simple as calling opcache_reset(); the snag is that OpCache has a separate cache for each SAPI. Calling opcache_reset from a console command will just flush your CLI cache, rather than FPM.

Here's how I worked around this:

  1. Create an endpoint that calls opcache_reset. Protect it somehow.
  2. Create a console command that calls the request via the FPM SAPI.
    a. i.e. Use cURL or Httpful to POST the token to http://example.com/_token.
  3. Call that command from your deployment scripts after the release is ready.

Not hugely elegant, but it works. For example, in Laravel the web hook looks like this:

Route::filter('deployment', function() {
    return in_array(Request::getClientIp(), $allowedIps) &&
           in_array(Request::header('authorization'), $allowedTokens);
});

Route::group(['before' => 'deployment'], function() {
    Route::post('/_token', function() {
        Log::info('Resetting opcache for ' . php_sapi_name());
        opcache_reset();
    });
});

You could argue that restarting PHP-FPM also flushes the cache, but that still offers a slim downtime while it restarts - if you have a reasonable amount of traffic (or in internal systems, luck as bad as mine) you'll end up with someone hitting a 500 Gateway Timeout (or a phone call).

As the number and usage of Linux Containers grows this problem will eventually become less relevant as you'll never redeploy your application in the traditional sense - it's just a case of pointing Nginx at the relevant container which is handled in different ways (i.e. networking). But until then, this is a useful hack.

References