Overview
Web applications typically require very specific server environments in which to run. Most of us build applications from a variety of disparate components at different layers of abstraction. (We do not, for example, re-implement an operating system every time we wish to build a web app – to do so would be madness.)
Assuming you’re not using a Platform-as-a-service provider (e.g. Heroku, PHPFog), at some point you’re going to need to run up a production server and install relevant bits of software to make your app available to the world.
Traditionally, this has been a task done by someone with some sysadmin skills, who configures the server prior to launch – your code is then deployed to that environment. Our intrepid sysadmin sits at a command line and runs various commands to install software and edit configuration files.
This presents notable difficulties, however:
- Server configuration is rarely documented in its entirety
- Server configuration is not an easily repeatable task
- There is little scope for automation
There is a better way.
Automating Configuration
Software systems like Chef and Puppet define domain-specific languages (DSLs) and architecture for automating server configuration.
The sysadmin devises a series of scripts (in the case of Chef, we call them Recipes) and high-level configuration files specifying which recipes to run for which servers.
They then test these scripts – in full, from start to finish – to determine a correct server configuration.
With one command, we can configure a server to run Chef:
knife boostrap 192.168.1.2 -x someuser -P "somepassword" --sudo -r "role[database_server]"
In theory, it sounds tedious. Another layer of abstraction is added, introducing templating languages for configuration files and logic for dealing with particular interdependencies. In practice, the sysadmin was doing this anyway in their head – and much of it would go undocumented.
Configuration has now become code.
Continual Deployment
By default, Chef runs a “client” program on each server it configures. This clients periodically (I believe hourly, by default) confers with a server that manages the configuration.
If changes are detected on the server, Chef overwrites them, reverting configuration to a known state. This can be confusing when you first encounter it (Me: “OMG my configuration just changed what is happening!?”)
The correct way to configure servers once Chef is employed is to treat the recipes and cookbooks as the One True Configuration. All changes must be made to these files and then pushed to your Configuration Management Server.
This confers a powerful advantage: we can now adjust the configuration of an entire server cluster in one place, without needing to make changes across many servers (and potentially making mistakes in the process).
While we are testing and developing our server configuration, we can bootstrap the target machine once, and then simply run the chef-client program on that device any time we subsequently make a change (and push those changes up to our config server). Neat.
Managed Configuration Server
Chef can run in a ‘solo’ mode, where configuration is only stored locally and scripts are used to push that config up to the target machine without involving a Managed Configuration Server.
When working on the Pygg server cluster, I found this approach tedious. It was error-prone and overly complex, and I rapidly switched to using a Hosted Chef server provided by Opscode. Opscode provide a reasonable free plan for small projects, but do charge once a project grows beyond 5 servers.
There is an open-source alternative, Chef-Server allowing you to completely control the process. I’ve yet to try this out, so YMMV.
Advantages over the traditional approach
What does turning configuration into code get us?
- Code is traceableWith our configuration logic contained in text files, we can track changes to the configuration over time using tools like git and GitHub.
- Code is shareableOpscode maintains a repository of community-contributed cookbooks for common server-side packages. These can be used to very easily and quickly configure a server. The community will often add new recipes for new software as it becomes useful, and contributions from the world are welcomed.
- Configuration is Automated and RepeatableOnce you’ve tested and proven your configuration, it can be re-run at any time to configure a new server. If you’re feeling particularly clever, you can write software that can interact with your server configuration process – it becomes a fairly trivial exercise to implement dynamic server provisioning.
This is hugely powerful. Your application can react to load events – the software can determine that a spike in activity has occurred and scale up to react, and scale back down again when the storm has passed.
In Practice
At Pollenizer, we scaled Pygg up to a multi-server cluster using Chef. The process took about two weeks of learning and experimentation (we also switched from Apache2 to Nginx, and implemented some more comprehensive caching solutions.)
Once the database server and application server configurations were stabilised, scaling up to 3 web server nodes communicating to that database was a trivial exercise (a matter of about 10 minutes – one command).
Several months later, we had need to do the same thing for Wooboard. Building on the work we did for Pygg, we were able to deploy a new wooboard cluster in a matter of days (the application required some wrangling and there were some configuration elements that needed changing for Wooboard specifically).
The process was relatively painless – what would have been a multi-week operation was accomplished in a fraction of the time.
Start Learning
As you may have surmised by now, this is not an instructional article (more of a white-paper overview). If you want to learn how to use Chef, I highly recommend this tutorial.
The section on data bags is especially worth paying attention to, as they allow you to move passwords out of your version control system and into your Managed Configuration Server. This will reduce their potential exposure (e.g. developers may not need these passwords, and if your repository is compromised your servers don’t get owned (unless your Configuration server gets owned…))
The Opscode Wiki is the definitive source of information, but can be a little impenetrable to a newcomer in places (highly recommend the tutorial first).
Ok that’s good. i knows very well about server. according to me benefits of server are..Lower number of physical servers – you can reduce hardware maintenance costs because of a lower number of physical servers..By implementing a server consolidation strategy, you can increase the space utilization efficiency in your data center..By having each application within its own “virtual server” you can prevent one application from impacting another application when upgrades or changes are made..You can develop a standard virtual server build that can be easily duplicated which will speed up server deployment.You can deploy multiple operating system technologies on a single hardware platform means Windows Server 2003, Linux, Windows 2000, etc..
so why not just stick to using a service like Heroku and save all this work?
I quite agree with you Brad – PaaS solutions are a very good option for startups. However, we have a lot of PHP legacy behind us and some very generous free hosting options for our startups, which makes it prudent for us to roll our own in many cases.
There’s also something to be said for the flexibility of running your own servers. Heroku is pure gold, up to a point – but it falls over when you need to do a lot of background work.
It may also present an additional due diligence hurdle on exit, in some cases (IANAL).