The 12 Factor PHP App – Part 3

2018-12-24 19.54.37

This is Part 3 of a 3 part series:

This series takes the development guidelines specified in the 12 Factor App manifesto and examines their relevance within the context of PHP applications.

In this Part of the Series

In the final part of this series we’ll cover factors 9 through 12:

  • Disposability: maximize robustness with fast startup and graceful shutdown
  • Dev/prod parity: keep development, staging, and production as similar as possible
  • Logs: treat logs as event streams
  • Admin processes: run admin/management tasks as one-off processes

IX: Disposability

Maximize robustness with fast startup and graceful shutdown.

A 12 factor application should be highly “disposable”. This means that application processes can be quickly and easily started, stopped and restarted without significant user impact (extended downtime or lost data).

Disposability is important for two reasons:

  1. It means code changes (deployments) can be made frequently.
  2. It allows new processes to be added easily (horizontal scaling).

How to Apply?

In web applications we need to consider disposability along two different axes:

  • Web processes. These are the processes that handle individual HTTP requests as they are received from the web server (e.g: Apache or nginx).
  • Background jobs (worker processes). These are the long-running processes which typically run as daemons, and are maintained by a process manager (e.g: Circus, SupervisorD).

Web processes

For web processes, we achieve disposability through PHP’s Fast Process Manager (PHP-FPM).

PHP-FPM is responsible for starting and stopping processes to handle requests from the web server. The number of processes is controlled via the “pm” configuration value (dynamic, ondemand, static), and we simply start/stop/restart the parent PHP-FPM process when we want to start/stop/restart the PHP processes. On a Debian-style system this will typically look like this:

# start the processes
service php7.1-fpm start

# restart the processes
service php7.1-fpm restart

# reload the processes (graceful "restart")
service php7.1-fpm reload

# stop the processes entirely
service php7.1-fpm stop

One key consideration for disposability is that we must restart the parent PHP-FPM process anytime the codebase changes (e.g: during deployments) in order to rebuild the opcode cache. This can be done gracefully (without dropping requests/transactions) using the reload command or USR2 control signal.

Background jobs

Disposability for background jobs is largely a product of technology choice and queue implementation.

For technology choice, we must use a data store that is persistent (e.g: Redis, Beanstalkd) so the state of the queue survives process failures/restarts.

In terms of queue implementation, we must support retry on failure so background jobs that fail to run to completion are returned to the queue for later retry. The easiest way to manage retry on failure is to leverage an open source queue implementation that includes this functionality out-of-the-box (such as PHP Resque or Laravel’s native Queue driver).

When to Apply?

By using PHP-FPM and implementing some specific behaviours in your queueing system, it’s possible to achieve a reasonable level of process disposability. This removes a number of problems related to deployments and scaling and does not incur a significant amount of work or any performance degradation.

Unless there are specific technology or project constraints preventing it, this should be implemented for all projects.


X: Dev/prod parity

Keep development, staging, and production as similar as possible

Ensuring that the development (as well as staging, pre-production, etc) environment matches production as closely as possible greatly reduces the likelihood of bugs introduced because of incompatibilities between environments.

Within the context of a PHP application, we want to make sure that the following dependencies are as closely aligned between environments as possible:

  • Web server (e.g nginx).
  • PHP version and process manager.
  • Data store(s) vendor(s), including version numbers.
  • Queue data source.

How to Apply?

Installing and managing environment dependencies directly on each individual developer’s machine is problematic:

  • Manually installing dependencies is time consuming and error prone.
  • Upgrades are difficult to manage. Each developer will need to perform upgrades differently, based on source/target version numbers of dependencies.
  • The developer might need to support applications running on different versions of key dependencies, with no easy way to switch contexts when working on different projects (e.g: PHP 5.3 vs 7.3).
  • The developer might be using a different operating system to the production environment (e.g Mac OSX vs a Linux variant).

With this in mind, it makes much more sense to “abstract” the development environment away from the target machine using virtualization.

The simplest implementation is to simply use a virtual machine to host the development environment which is run inside a hypervisor such as Virtual Box. Using this strategy, we build a single VM image which includes dependencies that match production, and share it with our team. Tools such as Vagrant can be used to simplify the process (also for Laravel, see Homestead).

There are some problems with this approach, however. VM images are typically quite large (in terms of filesize), and there is a significant performance impact. It also takes quite some time to start, stop and restart virtual machines.

Another approach is to use Docker, which relies on the concept of Linux containers instead of virtual machines. With Docker the development environment runs inside a “container” which shares the operating system kernel with the host machine, but virtualizes the filesystem and networking layers. Docker images are typically lighter-weight in terms of both storage space and performance overhead when compared to VMs.

To use Docker, we need to define Dockerfiles, which are declarative configuration files which tell Docker how to prepare and run the individual processes which make up our application. When defining and running multiple containers (e.g: web server, database server) we can simplify the process using Docker Compose, a tool built on-top of Docker for service composition.

An example Docker Compose configuration might look as such:

For Laravel development, we can use Laradock to generate a Docker configuration which matches our specific environment requirements.

When to Apply?

Achieving dev/prod parity is an important goal for any development project as it will increase the speed at which we are able to build, test and deploy code, and increase the reliability of our deployments.

For most small to medium sized projects, we can (and should) leverage virtualized development environments so the gap between dev and prod is reduced as much as possible. It is only when the hardware or software requirements of a web application cannot feasibly be replicated on a development machine that it becomes reasonable to differentiate between the environments.


XI: Logs

Treat logs as event streams

Log files are output generated during the normal operation (or during error states) of a web application.

Factor XI states that an application should not be concerned with the management of logs. Instead, the application should simply write to STDOUT and STDERR (also known as the standard streams in Unix), and the environment should be responsible for routing the messages to the correct location.

This is important in a web application because the handling of logging will change based on the execution environment. For example, in a local development environment (e.g: running inside a Docker container) the log output will be routed to a terminal session running on the developer’s machine (so she can see error messages and application logs in realtime). In a multi-server cloud based deployment with 10’s or 100’s of servers, the logs will be routed to an aggregation engine (e.g Hadoop).

How to Apply?

This is a difficult factor to implement in PHP because of the process model and limitations within PHP-FPM.

Because the lifecycle and operation of our PHP processes is controlled by PHP-FPM, we also depend on it to aggregate output from multiple concurrent PHP processes. In reality, before PHP version 7.3 there was actually a severe limitation (AKA bug) in PHP-FPM meaning it was impossible to route application logs to STDOUT without data truncation issues.

With that said, as of PHP >= 7.3 it is possible to route application logs through standard streams.

The standard PHP error log can be configured to write to STDOUT by modifying the PHP-FPM configuration as such:

# /etc/php/7.1/fpm/php-fpm.conf
error_log = /proc/self/fd/2

In PHP we will also typically leverage a third-party library for application-level logging (Monolog is the most ubiquitous library for this). If we want to push these log messages to STDOUT, we can use the php://stdout stream which is provided to us in the PHP execution environment.

For example, it can be configured in a Laravel application as such:

When to Apply?

In my opinion this is one of the most (if not the most) flexible factor, given the execution context of PHP.

In local/development environments, and in most small to medium use-cases (i.e: applications that are run on a handful of servers, dealing with up to 100’s or 1,000’s of requests per second) there is no need for complicated logging configurations.

Most MVC frameworks have log management (including log rotation and archiving) built-in, so the easiest solution is to start with a simple file-based log, and then switch to a Logging-as-a-Service (LaaS) product such as Papertrail when the volume of logging information justifies it.

With that said, if you are managing large numbers of servers or handling a lot of requests per second (and as a result a lot of log messages), it may simplify your configuration management and deployment processes to use standard streams for logging and manage routing via the environment.


XII: Admin processes

Run admin/management tasks as one-off processes

In a 12 Factor App, any administrative or maintenance actions that need to be completed in the production execution environment (e.g: database changes) should be run as once-off processes, executed in the same environment as the processes handling web requests.

This means admin processes are repeatable, testable and bundled via source control with our application.

How to Apply?

Typically we do this by implementing admin processes as scripts which are run from the command line and execute within the same context as our web application.

We can do this using the Symfony Console library, which enables us to write CLI scripts integrated into our application’s codebase:

The Laravel framework also provides the ability to implement console commands. Behind the scenes Laravel actually uses the Symfony Console library, but it wraps this functionality in its own interface:

When to Apply?

Implementing admin processes as console commands is best practice, and should be our default pattern when implementing any tasks or maintenance actions.

This is particularly relevant for database changes, which we can implement using “migrations”. Migrations are sequential, programmatically defined database modification scripts which are bundled with the application and executed via console commands. 


Conclusion

In this instalment of The 12 Factor PHP App we reviewed factors 9 through 12 and, with varying success, saw how we can implement them in a PHP application.

We explored disposability and how by using PHP-FPM and implementing our background jobs with some specific constraints, we can build application processes that can be quickly started and stopped, allowing us to deploy frequently and scale easily.

We also discussed how we might achieve dev/prod parity using Docker, reducing the number of bugs caused by environment mismatches and making it quick and easy for developers to get up and running on our project.

After that we looked at logs and how we can treat them as event streams, making it easy to handle them differently based on the execution environment and aggregate them when running in a multi-process cloud-based context.

Finally we investigated how we can implement once-off administrative and maintenance tasks as admin processes, making them testable, repeatable and version controlled.