enjoying salad since 1978.

Monday, July 20, 2009

Functional Refactoring #1: Replace conditional inside fold with filter.

Recently, while working on some code, I noticed that I have been making the same transform on lots of functional code in Scala. I think it's clearly a Refactoring and should have a name.

"Replace conditional inside fold with filter"

The idea is that if you're folding over items in a list and manipulating them based on a conditional, it might be clearer if you pull out the conditional and filter the list on that conditional first.

Here is a contrived example borrowed from a fake test suite. Let's say I had code with a conditional inside of a fold:

sampleUsers.values.foreach { u =>
  if (u.id != 0) {
    Authentication(u.id, u.hash).passwordToken mustEqual u.passwordToken
  }
}

Let's apply this refactoring to make the test case clearer.

sampleUsers.values.filter(u => u.id != 0).foreach { u =>
  Authentication(u.id, u.hash).passwordToken mustEqual u.passwordToken
}

This works with multiple inner conditionals as well. You can fit them into one filter or chain multiple filters together.

Even though this is a simple refactoring, it seems like a worthwhile place to start cataloging.

Monday, December 08, 2008

More Varnish: Thoughts from the Author

Poul-Henning Kamp, the lead author of Varnish, was kind enough to respond to my blog post.

Hi Steve,

I read your comments about Varnish and thought I'd better explain myself, feel free to post this if you want.

The reason why we don't multiplex threads in Varnish is that it has not become a priority for us yet, and I am not convinced that we will ever see it become a priority at the cost it bears.

A common misconception is that varnish parks a thread waiting for the client to send the next request. We do not, that would be horribly stupid. Sessions are multiplexed when idle.

The Varnish worker thread takes over when the request is completely read and leaves the session again when the answer is delivered to the network stack (and no further input is pending).

For a cache hit, this takes less than 10 microseconds, and we are still shaving that number.

The problem about multiplexing is that it is horribly expensive, both in system calls, cache pressure and complexity.

If we had a system call that was called "write these bytes, but don't block longer than N seconds", multiplexing could be sensibly implemented.

But with POSIX mostly being a historical society, it will cost you around 7 system calls, some surprisingly expensive, to even find out if multiplexing is necessary in the first place.

In comparison, today Varnish delivers a cache-hit in 7 system calls *total* of which a few are technically only there for debugging.

The next good reason is that we have no really fiddled with the sendbuffer sizes yet, but obviously, if you sendbuffer can swallow the read, the thread does not wait.

And if that fails, a thread blocked in a write(2) system call is quite cheap to have around.

It uses a little RAM, but it's not that much, and we can probably tune it down quite a bit.

The scheduler does not bother more with the thread than it has to, and when it does, the VM hardware system is not kickstarted every time we cross the user/kernel barrier.

Without getting into spectroscoping comparisons between apples and oranges, a thread just lounging around, waiting for the network stack to do its thing, is much, much cheaper than a thread which does a lot of system calls and fd-table walking, only to perform a few and small(-ish) writes every time the network stack wakes up.

And the final reason why it may never become necessary to multiplex threads, is that servers are cheap.

But if we get to the point where we need multiplexing, we will do it.

But I like the old design principles from the X11 project: we will not do it, until we have a server that doesn't work without it.

But if you are in the business of delivering ISOs to 56k modems then yes, Varnish is probably not for you.

Poul-Henning

Saturday, December 06, 2008

Thoughts on Varnish

Varnish is getting a lot of attention these days around the internet, and with good reason, it’s a nicely written and speedy cache, and has a nice DSL for caching. It has great features like hot reloading of cache rules and ESI.

One thing that’s really surprised me, though, is that Varnish uses one thread per connection. Most network programs designed for high number of connection don’t use one thread per connection anymore as it has serious drawbacks.

With slow clients, many of your threads are spending a lot of time doing nothing but blocking in write(). In all internet consumer apps, I believe, slow clients make up the majority of your connections. But even though the threads are doing nothing, the OS still has memory and scheduling overhead in dealing with them. You find yourself with an artificially low ceiling on the amount of users you can service with a single machine.

What makes a client slow, though? Both speed and latency. Cell phones, 56k modems, and users on high speed links but not geographically close to your data center can all be classified as ‘slow’.

One design that is more appropriate for dealing with the slow client problem uses a pool of worker threads or processes behind the scene and epoll / kqueue / event ports handling slow clients and telling the pool of workers that a socket is ready with a change notification. Your cost is still correlated with growth but at a much lower rate and the number of users you can service will dramatically increase.

So why does Varnish use this older, more troublesome model? Probably because most services aren’t going to notice the bottleneck; They simply don’t have enough concurrent connections to worry about using a few extra machines. If you’re never saturated a load balancer or firewall, you’ve probably never had to seriously consider the C10k issues involved.

Also, unfortunately, the way most people write load tests is that they are only testing the All Fast Clients scenario and not a mix of fast clients and slow clients. I’m guilty of this, too.

My executive summary: Varnish is a nice piece of software, and I hope they spend the time to make it useful for larger sites as well as smaller ones.

Monday, November 17, 2008

More RPMs means faster access times. No news there.

When I upgraded my home laptop from a 2-year old MacBook Pro to one of the newly released unibody models, I decided to upgrade from a 5400 RPM drive to a 7200 RPM drive. I ran some bonnie-64 benchmarks and noticed a 40% improvement in random seeks/sec and some other impressive numbers. It's helped make my weekend hacking much more pleasant.

Here are the old numbers:

and the new numbers:

Bottom line: Recommended

Saturday, September 13, 2008

Amazon as Political Pulpit

Check out the Amazon reviews for Spore, the new Will Wright game. 2,016 of the 2216 reviews gave it 1 star for the heavy-handed DRM. I use Amazon reviews heavily and I've never seen protest brought to the reviews at this scale before. Let's see if people take notice.

TorrentFreak points out the following:

DRM doesn’t stop people from pirating a game, on the contrary. It only hurts legitimate customers since the DRM is removed from the pirate version."

There are lots of philosophical reasons not to buy things with DRM. For me, the practical reason wins out. I'd rather deal with having to store music CDs and not play some potentially awesome games than deal with losing my data due to short-sighted DRM.

Friday, September 12, 2008

Alex's Scala Talk at C4[2]

al3x gave a talk on Scala recently at C4[2] and it hits a lot of the high points as to why we're using Scala at Twitter to build back-end services. I've been programming in Scala seriously for about a year and a half now and it's positively ruined me for plain Java.

Wednesday, July 30, 2008

So I don't forget it.

A cheesy generic tcp proxy I found cruising the webs built out of netcat, fifos, and tee:

$ mkfifo backpipe
$ sudo nc -l 80 0<backpipe | tee -a inflow | nc localhost 8080| tee -a outflow 1>backpipe

This way you can also look at inflow and outflow to see what the actual contents were of the transaction.

Sunday, June 29, 2008

Simulating Byzantine failure with SIGSTOP

If your service relies on connecting to an internal network server and that server isn't accepting connections, your client will obviously throw an error. This happens often enough that you probably already check for this and do the right thing in your various projects. But what if the server is accepting connections but never returning any data? This failure case is rare but very deadly. Chet mentioned that you could simulate this using SIGSTOP so I decided to whip up an experiment with memcached as my victim.

stevej@t42p:~$ ps auxww |grep memcache
stevej    3451  0.0  0.0   2928  1872 pts/0    T    01:21   0:00 memcached -vv -p 11211
stevej@t42p:~$ kill -stop 3451

In another terminal:

stevej@t42p:~$ irb
irb(main):001:0> require 'rubygems'
=> true
irb(main):002:0> require 'memcache'
=> true
irb(main):003:0> CACHE = MemCache.new "localhost:11211"
=> <MemCache: 1 servers, 1 buckets, ns: nil, ro: false>
irb(main):004:0> CACHE.get("foo")

The client library happily hung for several hours while I did other things. How can a process that's suspended not timeout incoming connections? Well, it's the kernel that services network requests and the process itself is only reading the buffers. If you want proof, look at this tcpdump output. Remember, the process has already been suspended by the time I ran tcpdump here.

stevej@t42p:~$ sudo tcpdump -i lo port 11211
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
18:02:40.576255 IP localhost.48124 > localhost.11211: F 2018798159:2018798159(0) ack 2012359105 win 257 <nop,nop,timestamp 15455978 14281170>
18:02:40.577037 IP localhost.11211 > localhost.48124: . ack 1 win 256 <nop,nop,timestamp 15455979 15455978>
18:03:19.037410 IP localhost.35662 > localhost.11211: S 2731273926:2731273926(0) win 32792 <mss 16396,sackOK,timestamp 15465593 0,nop,wscale 7>
18:03:19.037435 IP localhost.11211 > localhost.35662: S 2723119696:2723119696(0) ack 2731273927 win 32768 <mss 16396,sackOK,timestamp 15465593 15465593,nop,wscale 7>
18:03:19.037449 IP localhost.35662 > localhost.11211: . ack 1 win 257 <nop,nop,timestamp 15465593 15465593>
18:03:19.037768 IP localhost.35662 > localhost.11211: P 1:10(9) ack 1 win 257 <nop,nop,timestamp 15465593 15465593>
18:03:19.037776 IP localhost.11211 > localhost.35662: . ack 10 win 256 <nop,nop,timestamp 15465593 15465593>

So a connect timeout wouldn't help here, you need a recv timeout or something else. Restarting your client process won't help at all, it'll simply get stuck in the same place. In Ruby, the easiest thing to do is to use the Timeout module. Sadly, it only has second granularity but that's a lot better than hanging for several hours. You can also set use Socket#setsockopt with a recv timeout if you need finer grained timeout resolution.

stevej@t42p:~$ irb
irb(main):001:0> require 'rubygems'
=> true
irb(main):002:0> require 'memcache'
=> true
irb(main):003:0> CACHE = MemCache.new "localhost:11211"
=> <MemCache: 1 servers, 1 buckets, ns: nil, ro: false>
irb(main):004:0> require 'timeout'
=> false
irb(main):005:0> foo = Timeout::timeout(1) do
irb(main):006:1*   CACHE.get("foo")
irb(main):007:1> end
/usr/lib/ruby/1.8/timeout.rb:54:in `cache_get': execution expired (Timeout::Error)
        from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.5.0/lib/memcache.rb:209:in `get'
        from (irb):6:in `irb_binding'
        from /usr/lib/ruby/1.8/timeout.rb:56:in `timeout'
        from (irb):5:in `irb_binding'
        from /usr/lib/ruby/1.8/irb/workspace.rb:52:in `irb_binding'
        from /usr/lib/ruby/1.8/irb/workspace.rb:52

Do you want to guess what happened when I sent SIGCONT to memcache? My client processes, even the ones that had been hanging for hours, immediately returned with the expected data.

The obvious thing to do is to write a new MemCache subclass decorating all the calls to get, put, get_multi, etc with safer versions. Don't naively trust that the expected data made it to the cache.

Thursday, May 22, 2008

More on Twitter!

Al3x wrote up a nice blog post talking about the future of twitter.

Wednesday, May 21, 2008

How work's going?

Since I started working at Twitter last month, I put up a standard work disclaimer along the side. It always applies.

Jack posted on the company blog: I have this graph up on my screen all the time. It should be flat. This week has been rough.

So we have open job postings for something called a Systems Engineer, which is what I do at Twitter. Systems Engineering means building systems where graphs like that stay flat and where downtime means it was either planned or making sure that particular problem won't happen again (if it can be avoided: typical engineering trade-offs apply).

Our problems are really interesting, I think. Lots of users, lots of connections, lots of messages flowing through the system, lots of endpoints, and lots of details to keep straight. All of this needs to be turned into a cohesive system that's simple to reason about and to run in order for me to consider my job a success. It's a tall order but it's what I signed up to do. I've been watching Twitter for a long time (I'm user #150) so I walked into things with my eyes wide open.

If you've been reading this blog for a while, you know that I'm more interested in engineering than hacking together a site. Thinking and then doing. Measuring and then reasoning. Making guesses and then testing them. There's a natural tension between cowboying around and Analysis Paralysis and you have to learn to walk that tightrope if you want to succeed and I think at Twitter, we work pretty hard to Do the Right Thing.

I'm writing this quick post because we're looking for great people who are interested in engineering big systems and in helping to make Twitter the utility-class company we see ourselves as needing to be. If you think you either have the skills or can learn them, please send us your resume to jobs@twitter.com.

Monday, May 19, 2008

Fun DTrace script


#!/usr/sbin/dtrace -s

syscall:::entry
/pid == $1/
{
 @sys[probefunc, ustack()] = count();
}

END {
    trunc(@sys, 2);
}

Tells you the 2 most often called system call/stack trace pair. Running it against firefox 3 beta while using Google Reader shows:

$ sudo ./syscalldist.d 240
dtrace: script './syscalldist.d' matched 428 probes
^C
CPU     ID                    FUNCTION:NAME
  1      2                             :END 

  munmap                                            
              libSystem.B.dylib`munmap$UNIX2003+0xa
              libSystem.B.dylib`free+0x6a
              CoreGraphics`CGEventCreateFromDataAndSource+0xbce
              CoreGraphics`CGSDecodeEventRecord+0x6a
              CoreGraphics`CGSDispatchDatagramsFromStream+0x28f
              CoreGraphics`snarfEvents+0x12a
              CoreGraphics`CGSGetNextEventRecordInternal+0x9f
              CoreGraphics`CGEventCreateNextEvent+0x2c
              HIToolbox`PullEventsFromWindowServerOnConnection(unsigned int, unsigned char)+0x58
              CoreFoundation`__CFMachPortPerform+0x75
              CoreFoundation`CFRunLoopRunSpecific+0xf51
              CoreFoundation`CFRunLoopRunInMode+0x58
              HIToolbox`RunCurrentEventLoopInMode+0x11b
              HIToolbox`ReceiveNextEventCommon+0x176
              HIToolbox`BlockUntilNextEventMatchingListInMode+0x6a
              AppKit`_DPSNextEvent+0x291
              AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]+0x80
              AppKit`-[NSApplication run]+0x31b
              XUL`JSD_GetValueForObject+0xad6ce
              XUL`XRE_GetFileFromPath+0x61c563
              961
  mmap                                              
              libSystem.B.dylib`mmap+0xa
              libSystem.B.dylib`large_and_huge_malloc+0xcb
              libSystem.B.dylib`szone_malloc+0x1cf
              libSystem.B.dylib`malloc_zone_malloc+0x51
              libSystem.B.dylib`malloc+0x37
              CoreGraphics`CGEventCreateFromDataAndSource+0x15e
              CoreGraphics`CGSDecodeEventRecord+0x6a
              CoreGraphics`CGSDispatchDatagramsFromStream+0x28f
              CoreGraphics`snarfEvents+0x12a
              CoreGraphics`CGSGetNextEventRecordInternal+0x9f
              CoreGraphics`CGEventCreateNextEvent+0x2c
              HIToolbox`PullEventsFromWindowServerOnConnection(unsigned int, unsigned char)+0x58
              CoreFoundation`__CFMachPortPerform+0x75
              CoreFoundation`CFRunLoopRunSpecific+0xf51
              CoreFoundation`CFRunLoopRunInMode+0x58
              HIToolbox`RunCurrentEventLoopInMode+0x11b
              HIToolbox`ReceiveNextEventCommon+0x176
              HIToolbox`BlockUntilNextEventMatchingListInMode+0x6a
              AppKit`_DPSNextEvent+0x291
              AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]+0x80
              997
Thrilling, I know!

Labels:

Sunday, May 18, 2008

DTrace for Java 6 on Leopard

When Java 6 for Leopard was released a few weeks ago, one thing that nobody seemed to notice was that Java now had DTrace probes on par with Java on Solaris.

What you expect is there:

With one exception: jstack doesn't appear to work. ustack works fine.


$ sudo dtrace -x jstackstrsize=2048 -n 'syscall::read:entry /execname == "java"/ { jstack(); }' 

dtrace: description 'syscall::read:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  3  17600                       read:entry 

  2  17600                       read:entry 

  3  17600                       read:entry 

  3  17600                       read:entry 

  2  17600                       read:entry 

  2  17600                       read:entry 

  2  17600                       read:entry 

  2  17600                       read:entry

There should be java stack traces under each read:entry line. (This is true even with -XX:+ExtendedDTraceProbes enabled)

I used robey's scarling for my guinea pig and had a lot of fun poking around at it with dtrace.

Labels:

Sunday, May 11, 2008

John McCarthy has a good sense of humor

From an informal talk he gave at Stanford recently that was written up in Hacker News:
Q. Can computers know?

A. This is largely a question of definition. If a camera looked at a table, we could
say it "knows" that there are four containers of liquid on the table (which was true).

Q. Is there any definition of "know" in which computers cannot succeed?

A. Well, I suppose the biblical sense.

Q. Ha, well, what makes you think that?

A. They don't satisfy the necessary axioms (laughter)

Monday, April 21, 2008

What are you doing?

reading @biz out me as a Twitter employee.