enjoying salad since 1978.

Monday, December 08, 2008

More Varnish: Thoughts from the Author

Poul-Henning Kamp, the lead author of Varnish, was kind enough to respond to my blog post.

Hi Steve,

I read your comments about Varnish and thought I'd better explain myself, feel free to post this if you want.

The reason why we don't multiplex threads in Varnish is that it has not become a priority for us yet, and I am not convinced that we will ever see it become a priority at the cost it bears.

A common misconception is that varnish parks a thread waiting for the client to send the next request. We do not, that would be horribly stupid. Sessions are multiplexed when idle.

The Varnish worker thread takes over when the request is completely read and leaves the session again when the answer is delivered to the network stack (and no further input is pending).

For a cache hit, this takes less than 10 microseconds, and we are still shaving that number.

The problem about multiplexing is that it is horribly expensive, both in system calls, cache pressure and complexity.

If we had a system call that was called "write these bytes, but don't block longer than N seconds", multiplexing could be sensibly implemented.

But with POSIX mostly being a historical society, it will cost you around 7 system calls, some surprisingly expensive, to even find out if multiplexing is necessary in the first place.

In comparison, today Varnish delivers a cache-hit in 7 system calls *total* of which a few are technically only there for debugging.

The next good reason is that we have no really fiddled with the sendbuffer sizes yet, but obviously, if you sendbuffer can swallow the read, the thread does not wait.

And if that fails, a thread blocked in a write(2) system call is quite cheap to have around.

It uses a little RAM, but it's not that much, and we can probably tune it down quite a bit.

The scheduler does not bother more with the thread than it has to, and when it does, the VM hardware system is not kickstarted every time we cross the user/kernel barrier.

Without getting into spectroscoping comparisons between apples and oranges, a thread just lounging around, waiting for the network stack to do its thing, is much, much cheaper than a thread which does a lot of system calls and fd-table walking, only to perform a few and small(-ish) writes every time the network stack wakes up.

And the final reason why it may never become necessary to multiplex threads, is that servers are cheap.

But if we get to the point where we need multiplexing, we will do it.

But I like the old design principles from the X11 project: we will not do it, until we have a server that doesn't work without it.

But if you are in the business of delivering ISOs to 56k modems then yes, Varnish is probably not for you.

Poul-Henning

Saturday, December 06, 2008

Thoughts on Varnish

Varnish is getting a lot of attention these days around the internet, and with good reason, it’s a nicely written and speedy cache, and has a nice DSL for caching. It has great features like hot reloading of cache rules and ESI.

One thing that’s really surprised me, though, is that Varnish uses one thread per connection. Most network programs designed for high number of connection don’t use one thread per connection anymore as it has serious drawbacks.

With slow clients, many of your threads are spending a lot of time doing nothing but blocking in write(). In all internet consumer apps, I believe, slow clients make up the majority of your connections. But even though the threads are doing nothing, the OS still has memory and scheduling overhead in dealing with them. You find yourself with an artificially low ceiling on the amount of users you can service with a single machine.

What makes a client slow, though? Both speed and latency. Cell phones, 56k modems, and users on high speed links but not geographically close to your data center can all be classified as ‘slow’.

One design that is more appropriate for dealing with the slow client problem uses a pool of worker threads or processes behind the scene and epoll / kqueue / event ports handling slow clients and telling the pool of workers that a socket is ready with a change notification. Your cost is still correlated with growth but at a much lower rate and the number of users you can service will dramatically increase.

So why does Varnish use this older, more troublesome model? Probably because most services aren’t going to notice the bottleneck; They simply don’t have enough concurrent connections to worry about using a few extra machines. If you’re never saturated a load balancer or firewall, you’ve probably never had to seriously consider the C10k issues involved.

Also, unfortunately, the way most people write load tests is that they are only testing the All Fast Clients scenario and not a mix of fast clients and slow clients. I’m guilty of this, too.

My executive summary: Varnish is a nice piece of software, and I hope they spend the time to make it useful for larger sites as well as smaller ones.