This week I had the opportunity to help a client upgrade one of their legacy systems to a new webserver. The site was using Mongrel as its Ruby handler and we’d noticed an unsatisfactory number of 502 errors. These errors occurred during periods of high stress where the Mongrel workers were overloaded and could not keep up with the demand of requests. I first tried solving the problem by increasing the number of workers from 5 to 8. Anyone who’s done this before will likely recognize that this is a double edged sword that may or may not work. Increasing the number of workers will grow the number of handlers in your pool, but these workers will consume more RAM and can subsequently slow down the system.
After several days of observing performance, we noted that the increase in workers had a net negative effect, so it was back to the drawing board. As with most legacy systems, where the idea of upgrading a key component of an old system is about as inviting as a dinner invitation from Jeffrey Dahmer, I wasn’t thrilled about the idea of ripping out the old Mongrel webserver. However, having done several upgrades from Mongrel to Passenger in the past, I’ve seen the performance improvements that can be experienced by replacing Mongrel.
So I set out to see what kind of performance gain I could achieve switching from Mongrel to Unicorn. I chose Unicorn because of its speed advantages (forking, UNIX sockets), the positive feedback it has received from the Rails community, and because I like Unicorns (yes, I’ve been known to purchase a bottle of wine based on the label).
My hypothesis was that a switch to Unicorn should result in a lower average response times and less 502 Bad Gateway errors. What I found left me scratching my head and re-running performance tests again and again until I was sure I wasn’t crazy.
To test my hypothesis, I wrote small a test plan using Apache JMeter that accessed the most frequently hit pages in my web application. Almost all of these pages are read heavy and cached (although less than I would like). Next, I ran two 30,000 sample tests against the site, comparing 5 mongrels and then Unicorn using a UNIX socket.
Here’s what I found testing 50 simultaneous users for a total of 30k samples per run:
||Average response time
|Mongrel (5 instances)
WHAT?? The stark difference in standard deviation absolutely perplexed me. It baffled me enough that I ran the trials 5 more times, but each time I experienced similar results. What is so surprising is that Mongrel, a server known for stuck workers, performed more consistently than its modern counter part. What troubled me however was that I was unable to duplicate the pesky 502 errors experienced in the production environment. This tells me one of two things: either my test is not representative of real life traffic, or my staging server is not plagued by the same memory constraints imposed by a cycle-sharing hosting provider like Slicehost.
There is no golden nugget of truth in this blog post other than genuine surprise that an older piece of technology still seems to hold up so well. It’s no wonder why Thin, another popular web server, uses Mongrel’s parsing engine. I came out of this experience with a greater respect for my old, old friend Mongrel.