I’m helping “my friend Jeff”:http://jeffjlin.typepad.com/ with some of the preparations for the release of the release of the next “Harvey Danger album”:http://harveydanger.com. The current site runs on a shared hosting account with “a reputable provider”:http://www.pair.com, but the band will need a big upgrade to help with a promotional strategy that calls for the distribution of some big (50MB+) media files. This is where I come in.
To make a long story short, thanks to Moore’s law and Ebbert’s fraud, servers and bandwidth are pretty cheap these days. Even so, it makes sense to make the most of what you have. In this regard, “lighttpd”:http://www.lighttpd.net/ looks like it might be a better bet than Apache for dealing with serving up big files that take a while to download.
Hosting packages in the $150/month range come with allowances for as much as 2TB data transfer, and the machines typically speced are, by my estimates, capable of dishing out that much data and more.
2TB would allow over 30K dowloads/month, averaging almost 4Mbps and about 1 download/minute. Of course, nothing is average, and my initial assumption was that peak traffic would be 10x the average, or as much as 40Mbps and 10 downloads/minute. Further consideration made me doubt that assumption.
Normal traffic patterns on harveydanger.com have a ~10x difference between average and peak hours, but with promotion, those numbers could change quickly. A link or two on a well read site might bring thousands of eager downloaders in the space of an hour.
Rather than trying to build (and pay) for such big spikes in demand, we decided that slow downloads were acceptable when the flood of requests overtaxed the server’s 100Mbps network connect. Lots of failed downloads, on the other hand would be completely unacceptable. Preople might leave empty handed and never come back. Even worse, they might keep retrying, causing the situation to snowball further.
So, today I set out to see what we could expect from big spikes in traffic by setting up a series of tests.
I figured that as few as 50 simutaneous downloaders on a mix of DSL and Cable modem connections would be enough to saturate the network connection on a server. Once that happend, the number of pending downloads would start stacking up, which would really start putting a strain on server resources. The question was, how big of a strain.
For the web server, I used my home file server, a modest box with a ~500MHz processor and 512MB of memory hooked to a 100Mbps switch. The servers we’d be contracting for would come with 2x as much memory and a ~4x faster CPU, but I figured it would be a good way to get a baseline for things like memory consumption and CPU load.
To simulate the client connections, I relied on my desktop machine, an Athlon 64 with 1GB of memory hooked to the same switch.
First off, I installed “Apache2”:http://httpd.apache.org/docs/2.0/ using the “mpm_worker”:http://httpd.apache.org/docs/2.0/mod/worker.html module, which spawns a limited number of long-lived processes and handles connections by allocating threads from the processes in the pool. It should be the most memory-efficient way to deploy apache.
Getting the server going was easy, since there was a pre-assembled package for “Ubuntu”:http://www.ubuntulinux.org/. Getting the load testing client nailed down wasn’t as straightforward. I started out using “Jmeter”:http://jakarta.apache.org/jmeter/, but quickly ran into problems.
Jmeter has worked well for me in the past, but the 50MB+ file I was downloading was too much for it. Even with only one or two concurrent users, it would run out of memory and crash after just a few minutes of operation. I bumped up the available memory for the JVM running JMeter, but even with 1GB allocated, it wasn’t enough to keep up even a modest load, nevermind 50+ simutaneous requests.
Next up, I tried compiling “Apache Flood”:http://httpd.apache.org/test/flood/, but it wouldn’t build properly, and I gave up. Next up was “Seige”:http://www.joedog.org/siege/, which compiled on Ubuntu running in a VirtualPC without too much hassle. It worked well enough to start running some tests, but then things went south. I shut the VirtualPC down in order to give it more memory, but it wouldn’t boot back up again.
I decided to download “Cygwin”:http://www.cygwin.com/ and try building Siege and running it directly on WinXP. After a few false starts to install missing packages, I got it to build. It even seemed to use a lot less CPU than when I was running it under Linux in a VirtualPC.
Sure enough, the little server was able to saturate the 100Mbps ethernet connection with 50 concurrent requests. I tried bumping it up further, but Seige started giving all sorts of network errors and crashing. I was dispairing of how I was going to do a real load test when I realized that I didn’t have the problem if I just started multiple copies of Siege and didn’t let any of them spawn more than 50 requests. Even this had its difficulties though. I often had to kill Siege once requests started completing because it seemed to be making WindowsXP lock-up and stop passing TCP/IP traffic until I killed the process.
Apache did a decent job with resource utilization. With 300 concurrent requests, 25 threads per process and 250 maximum connections, Apache had 10 processes running, each with about 600kb resident in RAM. With 200 concurrent requests limited to 150 connections, it had 6 processes of 1.5-MB each. 6-10MB isn’t bad at all. Even with 1000 concurrent requests you’d probably need less than 100MB for apache, which would still leave lots of memory for caching the limited number of large media files.
CPU utilization was pretty bad though. There was no idle time and the load average was 50+. Even basic operations at the command line seemed slow.
Next up, I tried Apache2 with the prefork module, which uses a pool of processes, where each processes only serves a single connection. This is very similar to the way the still widely deployed apache 1.3 works. As expected, this consumes a lot of memory. When apache was set to allow 200 active processes and I hit it with 250 concurrent requests, it was running 840k/process, or nearly 160MB. Serving more active requests would only push that number higher. CPU utilization and load averages were similar to those with the threaded version of apache.
From here, I turned to the specialists and went looking for HTTP servers that prioritized resource effieciency for basic file-serving over flexibility and dynamic content.
First up was “thttpd”:http://www.acme.com/software/thttpd/ from Acme Software. The “t” stands for throttling because thttpd allows you to restrict the bandwidth available for serving all or some of the files on a website (a nice feature that could come in handy). I obtained the sourcecode, compiled it and ran it.
The results were impressive. Memory utilization was higher than with the threaded Apache, settling in at 41MB or so for a single process, but the CPU utilization was much better, there was ~20% idletime even when serving 100-150 simutaneous requests and the load average was never more than ~2.
Last stop was “lighttpd”:http://www.lighttpd.net/, which offers much of the resource efficiency of thttpd, while offering better support for dynamic content (not needed for this application, but, not a liability either). I downloaded the latest release and got it compiled and running with minimal effort. I didn’t pay as much attention to its behavior under lower loads, but it was quickly clear that it was well suited to high concurrency.
With 250 simutaneous reqeusts it only used 2.8MB of RAM. Load hovered at ~2 and the CPU was, as usual, pegged. Bumping up to 400 requests pushed RAM use to 4.4MB. Even with 500 users, it continued humming along, and the server remained relatively responsive.
My conclusion is that Apache with mpm_worker would probably be suitable for this application, but I’m inclined to go with lighttpd since it makes the absolute best use of memory of all the options tested.
One thing I need to investigate further is the huge hit in throughput I noticed whenever relatively large and CPU hungry monitoring tasks that used perl scripts fired up.