I’ve been doing some initial experimentation at work on using virtual machines to host web applications, rather than running a bunch of vhosts out of the same system-wide Apache install. The big potential win is the ability to sandbox an application so that bugs and security holes don’t have the potential to take down other apps running on the same box, along with the (nice, but not essential) ability to migrate an app from one machine to another by simply sending a snapshot of the VM over the network.
We’ve been running our high-volume apps behind [`mod_proxy_balancer`](httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html) for a while now, and had pretty good using it to mask the actual hostname running an application, as well as doing SSL on the load balancer instead of the application server. However, the backend servers each run a single instance of Apache, (albeit listening on a handful of high-numbered ports) managed by the usual Debian administrative infrastructure, so any syntax errors, high-CPU-load pages, or security holes have the potential to break every app on the server.
One option to get around this would be to run multiple instances of Apache, each listening on their own port, with separate effective uids and document roots. However, we’d lose all the Debian goodness that makes dealing with Apache, PHP, and their respective dependencies so much less painful. I’ve managed non-trivial services using a full built-from-source LAMP stack before, and lemme tell you, updating for a minor security fix is decidedly non-trivial.
So, I’ve been trying out [KVM](http://kvm.qumranet.com/kvmwiki), which is an open-source virtualization tool similar to [VMWare](http://vmware.com/) but fully open-source and integrated into the mainline Linux kernel. (The other major open source option, [Xen](http://xensource.com), is interesting, but requires custom patches to the guest OS to run, and doesn’t have the support of the kernel developers.) With it, I can run a handful of lightweight virtual machines on each web app server box, each with their own basic Debian system environment and Apache install.
Within each sandbox, those systems are then free to do whatever bone-headed crap they like, and modulo the unavoidable resource-contention issues (CPU and IO bandwidth being the biggies) they can’t really take down any other services. As a bonus, I can easily clone a virtual machine at pretty much any time, move it to another physical host, and more or less double the throughput of my app, at least for those applications that are written with a truly “share nothing” architecture.
My initial naive benchmarks showed something like a 3x slowdown, though, for running a basic “Hello, world!” PHP app under virtualization vs. natively. I was pretty much ready to give up on the idea until I decided to test the system behavior assuming that an SSL-protected, native instance of Apache would be forwarding requests via the afore-mentioned `mod_proxy_balancer` model, and lo and behold, the difference dropped to within 10%. At that point, I’m willing to seriously look at using the VM model, since it brings so many potential security and manageability benefits to the table.
Of course, this also highlights the importance of looking for bottlenecks in your *entire* stack, not just the component you’re considering changing. I’m not convinced that the delta will be quite that small once I’ve placed a more representative load on the virtualized web server, but if I had rejected the idea out of hand due to the initial bad numbers, I never would have gotten to the point of properly evaluating the architecture.
Recent Comments