The 12 Factor PHP App – Part 1

This is Part 1 of a 3 part series:

In this series we’re going to look at ways of building scalable, secure and maintainable web applications using the 12 Factor App as a guide.

What is the “12 Factor App?”

Created and maintained by Adam Wiggins, one of the co-founders of Platform-as-a-Service company Heroku, the 12 Factor App is a document that outlines 12 “best practice” design recommendations that can be utilised when developing web-based software. The practices were derived from real-world observations from Wiggins and the Heroku team during the development and deployment of hundreds of applications.

A description from the document itself:

… This document synthesizes all of our experience and observations on a wide variety of software-as-a-service apps in the wild. It is a triangulation on ideal practices app development, paying particular attention to the dynamics of the organic growth of an app over time, the dynamics of collaboration between developers working on the app’s codebase, and avoiding the cost of software erosion.

In other words, engineering your web applications with regard to the 12 factors will ensure your software is long-lived and makes it easy for you and your team to adapt it to change – which is the only certainty when it comes to software development.

What this series aims to do

The 12 Factor App document outlines high-level design decisions, but doesn’t get down to the level of implementation details. This is by necessity, as it aims to discuss the factors in a language- and platform- agnostic way. This series aims to take the precepts laid out in the original document and provide a discussion of how they can be implemented specifically within the context of PHP applications.

In this series we’ll also look at why certain factors are important, and at what stage of the application development process you should consider incorporating them.

In this Part of the Series

In the first part of this 3-part series we’ll look at the first four factors. As described in the original document, these factors include:

  • Codebase: one codebase tracked in revision control, many deploys.
  • Dependencies: explicitly declare and isolate dependencies.
  • Config: store config in the environment.
  • Backing Services: treat backing services as attached resources.

I. Codebase

One codebase tracked in revision control, many deploys

This factor implies that all code eventually ends up in a single, centralised location. As a result we have a single “source of truth” for what’s going to end up in production. If we need to deploy the code into a different environment (for example, a staging, testing or development environment), we deploy the same code as we do in production.

Implementing this factor also means that we use verson control system (VCS) software to manage changes to the central repository. This way we can track changes, manage developers working on the same files and easily roll-back code in the event of regressions (bugs).

Note that although we only have a single codebase per application, we may compose applications using disparate components by using libraries. Using this approach, we have a central codebase that contains the core application code and use a dependency management tool (see Factor II – Depedencies) to pull in additional libraries.

When to apply?

Every project you implement should have a single source of truth repository, and that repository should be managed by one of the source control management tools. Period.

Whether you’re developing a hackathon project by yourself over the weekend or working on a 10 year old legacy codebase with hundreds of other developers, you should be following these recommendations. Everyone can benefit from code centralisation and source control management.

How to apply?

The best solution for managing your codebase will depend on a number of factors, including the number of developers on the project, the licensing for the software (public open source, for-profit closed source), the development platform and the deployment process. The best general answer for this question is to use GitHub. For open source projects it’s a no-brainer; for closed-source projects it might take some more thought, but this is still feasible using GitHub private repositories or (for large projects) GitHub enterprise.

In terms of VCS, Git is the best choice for distributed teams (and if you’re using GitHub this decision has already been made for you). Git supports workflows that are highly conducive to effective team-based work and is pretty much the industry standard for VCS.

II. Depedencies

Explicitly declare and isolate dependencies

Modern web applications are rarely written as monolithic, one-purpose components. More often they are composed of core application code that leverages supporting libraries (quite regularly written by third-parties). Unfortunately, including external libraries in our projects can introduce significant problems:

  • If we bundle the library with our application, it can significantly increase the size of our project; it’s not uncommon for the amount of library code to dwarf the actual core application code.
  • It’s difficult to apply an upgrade to a single library without affecting other libraries or our core application.
  • If we don’t bundle the library with our application, it can be difficult to deploy our code to different environments. For example, where should the library be sourced from and which version do we need to use?

We can solve most of these problems by utilising dependency management. Dependency management aims to make the composition of applications from smaller libraries significantly easier by fetching and managing libraries for us. We use a dependency manifest file to define which libraries and which versions of those libraries our application requires, and then the management tool does the rest for us. This enables us to keep non-core application code separate from our codebase.

When to apply?

As soon as you need to include any kind of library into your application (even if it’s internally developed) you should be using a dependency management tool. It may seem easy enough to manually include a third-party library in your codebase, but it quickly becomes unmanageable.

In fact, there’s a case for using a dependency management tool from the beginning, even if you don’t require the inclusion of third-party libraries. This is because there is very little overhead involved in including a dependency management tool into your project, and often it will provide you with immediate benefits (i.e: autoloading) and make it easier for you to include libraries later on as the project grows.

How to apply?

Dependency management can be implemented in PHP using Composer. A comprehensive guide on setting it up and using it is available here: Dependency Management with Composer.

In short, in order to use Composer we need to complete the following:

  • Install the Composer command line tool.
  • Define our dependencies using the composer.json dependency manifest file.
  • Include vendor/autoload.php file in our application’s bootstrap process.
  • Install our dependencies using the $ composer install command.
  • Update libraries to the latest compatible versions using the $ composer update command.

III. Configuration

Store config in the environment

Web applications commonly require some method of configuration. Whether it’s for specifying the location of attached resources (databases, API endpoints, etc.), setting application preferences or changing the mode of operation (eg: production, test or development), more often than not we need to supply parameters to our application for it to function.

Our initial intuition about supplying these settings is usually to provide a configuration file: a simple script that sets some globally defined variables. This approach is not ideal for a number of reasons:

  • Often we want to use different settings based on the environment. We usually don’t want our development web servers accessing production databases. Using the approach of file-based configuration, we would need multiple files and then a way of selecting which one to use.
  • Configuration files often contain sensitive information such as credentials for establishing database connections. Having these details committed to our codebase for all to see is not ideal.
  • We may have multiple web applications running on the same server requiring the same resources; using the file-based approach we would need multiple configuration files containing the same details.

A better approach is to use environment variables. Environment variables are configured at the level of the web server or operating system, and so are specific to the server the application is running on.

Quoting from the 12 Factor App document:

Apps sometimes store config as constants in the code. This is a violation of twelve-factor, which requires strict separation of config from code. Config varies substantially across deploys, code does not.

… the twelve-factor app stores config in environment variables. [Environment] vars are easy to change between deploys without changing any code; unlike config files, there is little chance of them being checked into the code repo accidentally; and unlike custom config files, or other config mechanisms […] they are a language- and OS-agnostic standard.

When to apply?

Unlike the previous factors, the best time to apply this one is less clear-cut. Especially when first starting out it may be difficult for you to control environment variables. Some shared-hosting arrangements might even make it impossible for you to use environment variables for configuration.

As a general rule of thumb, you should use environment variables for configuration if you have an easy way of managing them in the environments that you are deploying to; otherwise configuration files are fine as a stop-gap solution. As a side note, if you’re deploying to servers that don’t allow easy control of environment variables, you should probably look at migrating to a better platform provider.

How to apply?

If you must use a configuration file (see discussion above), the preferred method is:

  • Have multiple configuration files, one for each environment (usually development, test and production).
  • Each configuration file sets globally accessible variables. The application code assumes that these variables are available and populated with the relevant values at runtime.
  • Do not check the configuration files into the core repository.
  • During the deployment step, pull the relevant configuration file into the expected location (for example: config/config.php).

As previously mentioned, the preferred method of handling configuration is to use environment variables instead of the approach described above.

The way we set and manage our environment variables will depend on our environment; typical approaches are outlined below.

Setting Environment vars via Webserver Configuration

If we’re using Apache, we can use the mod_env module to enable environment variable configuration via the master configuration file, virtual host configuration file, or the .htaccess file.

; In configuration files
SetEnv ENVIRONMENT=test
SetEnv DATABASE_URL=mysql://user:[email protected]/db
# In .htaccess files
SetEnv ENVIRONMENT test
SetEnv DATABASE_URL mysql://user:[email protected]/db

Nginx, another popular PHP compatible webserver, provides access to a similar mechanism out of the box. In the configuration file, we can specify environment variables as such:

env ENVIRONMENT=test
env DATABASE_URL=mysql://user:[email protected]/db

More information is a available here.

Setting Environment vars using PaaS tools

If you’re using a Platform-as-a-Service provider, quite often the vendor will offer tools for setting and managing environment variables. For example, Heroku provides the following:

$ heroku config:add ENVIRONMENT=production 
Adding config vars and restarting myapp... done, v12
ENVIRONMENT: production

Setting Environment vars via the Command line

In *nix variants (Unix, Linux, Mac OS X) we can inject environment variables directly via the command line. We can add these to our ~/.bashrc file (or equivalent) to load them for every terminal session.

$ export DB_HOST=localhost

Note that this approach is useful for development and testing, but is not recommended for production environments.

Using Environment Variables

After we’ve set the environment variables, we can access them in our PHP application using the getenv() function.

IV. Backing Services

Treat backing services as attached resources

According to Factor IV, the application should not care whether resources it accesses are local (on the same server) or remote. We shouldn’t have to make any code changes to make this work. Instead, following Factor III: Configuration above, switching between local and remote resources should simply be a matter of changing the environment variable used to specify the location of the resource.

As a result, it becomes trivial for us to switch out data sources. It also implies that the application should interact with backing services in precisely the same ways, whether the interaction is occuring in the production or in test or development environment(s).

Implementing this factor necessitates that each resource can be entirely defined using a string-based “handle” that contains the protocol, host, endpoint and any necessary credentials. An example of this in the case of a database handle could be formulated as such:

mysql://admin:[email protected]/db

Where:

  • mysql:// is the protocol (database type).
  • admin:secret are the access credentials (username: admin, password: secret).
  • remotehost.com is the location of the host on which the resource resides; in this case a remote server.
  • db is the resource name; for this example, a MySQL database called “db.”

When to apply?

This factor is actually quite easy to implement and provides significant benefits in terms of simplifying the way you define instances of data sources within your application. As a result, it’s recommended that you use this pattern when ever you integrate with an external service (database, third-party API, etc).

Even if you know the service you’re integrating with is going to be on the same server, you should try to design your application in such a way that if the service were to be moved to a remote server your application could still interact with it with minimal changes. This will ensure your application can be horizontally scaled in future.

How to apply?

The key practice to making your application data source location independent is utilising a sane resource definition structure that captures all the details required to locate and negotiate a connection with a remote data source (even if it happens to be locally accessible). In most cases, using fully-qualified URLs is sufficient.

For example, consider the situation where we need to connect to a database. Using Factors III and IV in tandem, we can derive a solution similar to the following:

; Define the datasource in the web server configuration
SetEnv DATABASE_URL=mysql://dbuser:[email protected]:4567/application_db

As an aside, note that executing that following code:

… produces the following output:

array(6) {
  ["scheme"]=>
  string(5) "mysql"
  ["host"]=>
  string(9) "localhost"
  ["port"]=>
  int(4567)
  ["user"]=>
  string(4) "dbuser"
  ["pass"]=>
  string(4) "dbpass"
  ["path"]=>
  string(15) "/application_db"
}

With the structure of this array in mind, we can put everything together to connect to a database as such:

Conclusion

Now we’re completed the first part of this series, we know how to use the first 4 factors of the 12 Factor App to create web applications that have a single codebase, have managed dependencies, use environment variables for configuration and treat resources in a location-independent fashion.

By implementing these factors, we can create software that is easily maintainable and able to be scaled across a distributed architecture.

In the next instalment we’ll dig into the next 4 factors which will extend and expand on the benefits already covered.

Until then!