saladwithsteve2015-02-10T18:09:20+00:00http://saladwithsteve.com/Steve Jensonstevej@fruitless.orgBuilding F# on Mac OS X2010-07-22T00:00:00+00:00http://saladwithsteve.com/2010/07/building-fsharp-on-the-mac.html<p>I’m attending the Emerging Languages Conference here in Portland,
OR, where I ran into Joe Pamer on the F# team. I’ve been following F#
for a while but have had trouble running it on the Mac.</p>
<p>Joe says his team at Microsoft runs into these issues often with users
and was kind enough to walk me through the steps to get F# running on
the mac.</p>
<p>First install Mono, there’s a <a href="http://www.go-mono.com/mono-downloads/download.html">dmg</a></p>
<p>1. Download the <a href="http://fsxplat.codeplex.com/SourceControl/list/changesets">zip file</a>
2. Run <code>make_package.sh</code></p>
<p>This will complain about not finding <code>dpkg-buildpackage</code> but the dll will be signed with the mono key and be ready to use.</p>
<p><tt>
Assembly bin/FSharp.Core.dll signed. <br />
./make_package.sh: line 19: dpkg-buildpackage: command not found
</tt></p>
<p>Now run <code>gacutil</code> which will install the assembly into mono’s library path.</p>
<p><tt>
sudo gacutil -i ./fsharp/bin/FSharp.Core.dll
</tt></p>
<p>And now you’re ready to write some programs! There are some known
bugs around how it interacts with mono winforms (I think that’s a
thing, it’s greek to me) so we’ll turn that off.</p>
<p>Now let’s fire up the repl, run an expression, and exit.</p>
<p><tt>
$ mono fsharp/bin/fsi.exe <del>-gui</del> <br />
Microsoft® F# 2.0 Interactive build 2.0.0.0<br />
Copyright © Microsoft Corporation. All Rights Reserved. <br />
For help type #help;; <br />
> printf “hi\n”;; <br />
hi <br />
val it : unit = () <br />
> #q;;<br />
<br />
- Exit…<br />
</tt></p>
<p>Joe says that they are working on getting DMGs up for Mac users <span class="caps">ASAP</span>. (within days)</p>Steve Jensonhttp://saladwithsteve.com/about.htmlHello Again!2010-05-21T00:00:00+00:00http://saladwithsteve.com/2010/05/hello-again.html<h1>Hello Again!</h1>
<p>I’ve changed some things around here.</p>
<p>This site is now generated with
<a href="http://github.com/stevej/jekyll">Jekyll</a></p>
<p>Switching from Blogger wasn’t an easy decision but I felt like it was
time to jump start things and make it fun to write and tinker here
again.</p>
<p>Something that didn’t make the conversion were all of the old
comments. Jekyll doesn’t support comments. I considered porting them
over statically but I had already sat on the conversion for 6 months
and was ready to move on. See, I actually switched back in December
but had never actually updated this version of the site.</p>
<p>People left many wonderful comments. In the future, I’d like to
encourage you to simply email me. My email address is in the ‘About
Me’ page linked at the top. If you’d like me to post your response,
let me know.</p>
<p>Oh, you can also see a mirror of all my content on the blog’s <a href="http://github.com/stevej/saladwithsteve">github
repo</a></p>Steve Jensonhttp://saladwithsteve.com/about.htmlHigher Order Select in Ruby2009-12-13T00:00:00+00:00http://saladwithsteve.com/2009/12/higher-order-select-in-ruby.html<h1>Higher Order Select in Ruby</h1>
<p>When I was converting my blog to jekyll, I had to write a nested
select in Ruby and found it to be a little painful.</p>
<p>In Ruby:</p>
<div class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span class="c1"># For a given feed, return all the entries that</span>
<span class="c1"># are blog posts rather than templates, etc.</span>
<span class="k">def</span> <span class="nf">posts</span><span class="p">(</span><span class="n">feed</span><span class="p">)</span>
<span class="c1"># The post must have a category of type #post</span>
<span class="n">feed</span><span class="o">.</span><span class="n">entries</span><span class="o">.</span><span class="n">select</span> <span class="k">do</span> <span class="o">|</span><span class="n">entry</span><span class="o">|</span>
<span class="o">!</span><span class="n">entry</span><span class="o">.</span><span class="n">categories</span><span class="o">.</span><span class="n">select</span> <span class="k">do</span> <span class="o">|</span><span class="n">c</span><span class="o">|</span>
<span class="n">c</span><span class="o">.</span><span class="n">term</span> <span class="o">==</span> <span class="s1">'http://schemas.google.com/blogger/2008/kind#post'</span>
<span class="k">end</span><span class="o">.</span><span class="n">empty?</span>
<span class="k">end</span>
<span class="k">end</span></code></pre></div><p>In Scala, things are a little simpler thanks to anaphora: the ability
to easily reference earlier state implicitly. That’s what the _ means
in this context.</p>
<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">feed</span><span class="o">.</span><span class="n">entries</span><span class="o">.</span><span class="n">filter</span><span class="o">(!</span><span class="k">_</span><span class="o">.</span><span class="n">categories</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">term</span> <span class="o">==</span> <span class="s">"http://schemas.google.com/blogger/2008/kind#post"</span><span class="o">).</span><span class="n">isEmpty</span><span class="o">)</span></code></pre></div><p>Imagine the following sentence in English: “Steve used Steve’s keys to
start Steve’s car”. Most programming languages insist on that level of
verbosity.</p>
<p>Raganwald has a nice <a href="http://github.com/raganwald/homoiconic/blob/master/2009-09-22/anaphora.md#readme">piece</a> exploring Anaphora in Ruby.</p>Steve Jensonhttp://saladwithsteve.com/about.htmlFunctional Refactoring2009-07-21T00:00:00+00:00http://saladwithsteve.com/2009/07/functional-refactoring-1-replace.html<p>
Recently, while working on some code, I noticed that I have been making the same transform on lots of functional code in Scala. I think it's clearly a Refactoring and should have a name.
</p>
<p>
<h4>"Replace conditional inside fold with filter"</h4>
</p>
<p>
The idea is that if you're folding over items in a list and manipulating them based on a conditional, it might be clearer if you pull out the conditional and filter the list on that conditional first.
</p>
<p>
Here is a contrived example borrowed from a fake test suite. Let's say I had code with a conditional inside of a fold:
</p>
<p>
<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">sampleUsers</span><span class="o">.</span><span class="n">values</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">u</span> <span class="k">=></span>
<span class="k">if</span> <span class="o">(</span><span class="n">u</span><span class="o">.</span><span class="n">id</span> <span class="o">!=</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
<span class="nc">Authentication</span><span class="o">(</span><span class="n">u</span><span class="o">.</span><span class="n">id</span><span class="o">,</span> <span class="n">u</span><span class="o">.</span><span class="n">hash</span><span class="o">).</span><span class="n">passwordToken</span> <span class="n">mustEqual</span> <span class="n">u</span><span class="o">.</span><span class="n">passwordToken</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></div>
</p>
<p>
Let's apply this refactoring to make the test case clearer.
</p>
<p>
<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">sampleUsers</span><span class="o">.</span><span class="n">values</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="n">u</span> <span class="k">=></span> <span class="n">u</span><span class="o">.</span><span class="n">id</span> <span class="o">!=</span> <span class="mi">0</span><span class="o">).</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">u</span> <span class="k">=></span>
<span class="nc">Authentication</span><span class="o">(</span><span class="n">u</span><span class="o">.</span><span class="n">id</span><span class="o">,</span> <span class="n">u</span><span class="o">.</span><span class="n">hash</span><span class="o">).</span><span class="n">passwordToken</span> <span class="n">mustEqual</span> <span class="n">u</span><span class="o">.</span><span class="n">passwordToken</span>
<span class="o">}</span></code></pre></div>
</p>
<p>
This works with multiple inner conditionals as well. You can fit them into one filter or chain multiple filters together.
</p>
<p>
Even though this is a simple refactoring, it seems like a worthwhile place to start cataloging.
</p>
Steve Jensonhttp://saladwithsteve.com/about.htmlMore Varnish Thoughts from the Author2008-12-09T00:00:00+00:00http://saladwithsteve.com/2008/12/more-varnish-thoughts-from-author.html<a href="http://people.freebsd.org/~phk/">Poul-Henning Kamp</a>, the lead author of Varnish, was kind enough to respond to my blog post.
<tt>
<p>
Hi Steve,
</p>
<p>
I read your comments about Varnish and thought I'd better explain
myself, feel free to post this if you want.
</p>
<p>
The reason why we don't multiplex threads in Varnish is that it has
not become a priority for us yet, and I am not convinced that we
will ever see it become a priority at the cost it bears.
</p>
<p>
A common misconception is that varnish parks a thread waiting for
the client to send the next request. We do not, that would be
horribly stupid. Sessions are multiplexed when idle.
</p>
<p>
The Varnish worker thread takes over when the request is completely
read and leaves the session again when the answer is delivered to
the network stack (and no further input is pending).
</p>
<p>
For a cache hit, this takes less than 10 microseconds, and we are
still shaving that number.
</p>
<p>
The problem about multiplexing is that it is horribly expensive,
both in system calls, cache pressure and complexity.
</p>
<p>
If we had a system call that was called "write these bytes, but
don't block longer than N seconds", multiplexing could be sensibly
implemented.
</p>
<p>
But with POSIX mostly being a historical society, it will cost you
around 7 system calls, some surprisingly expensive, to even find
out if multiplexing is necessary in the first place.
</p>
<p>
In comparison, today Varnish delivers a cache-hit in 7 system calls
*total* of which a few are technically only there for debugging.
</p>
<p>
The next good reason is that we have no really fiddled with the
sendbuffer sizes yet, but obviously, if you sendbuffer can swallow
the read, the thread does not wait.
</p>
<p>
And if that fails, a thread blocked in a write(2) system call is
quite cheap to have around.
</p>
<p>
It uses a little RAM, but it's not that much, and we can probably
tune it down quite a bit.
</p>
<p>
The scheduler does not bother more with the thread than it has to,
and when it does, the VM hardware system is not kickstarted every
time we cross the user/kernel barrier.
</p>
<p>
Without getting into spectroscoping comparisons between apples and
oranges, a thread just lounging around, waiting for the network
stack to do its thing, is much, much cheaper than a thread which
does a lot of system calls and fd-table walking, only to perform
a few and small(-ish) writes every time the network stack wakes up.
</p>
<p>
And the final reason why it may never become necessary to multiplex
threads, is that servers are cheap.
</p>
<p>
But if we get to the point where we need multiplexing, we will do
it.
</p>
<p>
But I like the old design principles from the X11 project: we will
not do it, until we have a server that doesn't work without it.
</p>
<p>
But if you are in the business of delivering ISOs to 56k modems
then yes, Varnish is probably not for you.
</p>
<p>
Poul-Henning
</p>
</tt>
Steve Jensonhttp://saladwithsteve.com/about.htmlThoughts on Varnish2008-12-07T00:00:00+00:00http://saladwithsteve.com/2008/12/thoughts-on-varnish.html<p><a href="http://varnish.projects.linpro.no/">Varnish</a> is getting a lot of attention these days around the internet, and with good reason, it’s a nicely written and speedy cache, and has a nice <span class="caps">DSL</span> for caching. It has great features like hot reloading of cache rules and <span class="caps">ESI</span>.</p>
<p>One thing that’s really surprised me, though, is that Varnish uses one thread per connection. Most network programs designed for high number of connection don’t use one thread per connection anymore as it has serious drawbacks.</p>
<p>With slow clients, many of your threads are spending a lot of time doing nothing but blocking in <tt>write()</tt>. In all internet consumer apps, I believe, slow clients make up the majority of your connections. But even though the threads are doing nothing, the OS still has memory and scheduling overhead in dealing with them. You find yourself with an artificially low ceiling on the amount of users you can service with a single machine.</p>
<p>What makes a client slow, though? Both speed and latency. Cell phones, 56k modems, and users on high speed links but not geographically close to your data center can all be classified as ‘slow’.</p>
<p>One design that is more appropriate for dealing with the slow client problem uses a pool of worker
threads or processes behind the scene and
<a href="http://www.xmailserver.org/linux-patches/nio-improve.html">epoll</a> /
<a href="http://people.freebsd.org/~jlemon/kqueue_slides/index.html">kqueue</a> / <a href="http://developers.sun.com/solaris/articles/event_completion.html">event
ports</a> handling slow clients and
telling the pool of workers that a socket is ready with a change notification. Your cost is still
correlated with growth but at a much lower rate and the number of users you can service will
dramatically increase.</p>
<p>So why does Varnish use this older, more troublesome model? Probably because most services aren’t
going to notice the bottleneck; They simply don’t have enough concurrent connections to worry about
using a few extra machines. If you’re never saturated a load balancer or firewall, you’ve probably
never had to seriously consider the <a href="http://www.kegel.com/c10k.html">C10k</a> issues involved.</p>
<p>Also, unfortunately, the way most people write load tests is that they are only testing the All Fast
Clients scenario and not a mix of fast clients and slow clients. I’m guilty of this, too.</p>
<p>My executive summary: Varnish is a nice piece of software, and I hope they spend the time to make it
useful for larger sites as well as smaller ones.</p>
Steve Jensonhttp://saladwithsteve.com/about.htmlMore RPMs means faster access times. No news there.2008-11-18T00:00:00+00:00http://saladwithsteve.com/2008/11/more-rpms-means-faster-access-times-no.html<p>
When I upgraded my home laptop from a 2-year old MacBook Pro to one of the newly released unibody models, I decided to upgrade from a 5400 RPM drive to a 7200 RPM drive. I ran some <a href="http://code.google.com/p/bonnie-64/">bonnie-64</a> benchmarks and noticed a 40% improvement in random seeks/sec and some other impressive numbers. It's helped make my weekend hacking much more pleasant.
</p>
<p>
Here are the old numbers:
<script src="http://gist.github.com/25675.js"></script>
</p>
<p>
and the new numbers:
<script src="http://gist.github.com/25200.js"></script>
</p>
<p>
<strong>Bottom line: Recommended</strong>
</p>
Steve Jensonhttp://saladwithsteve.com/about.htmlAmazon as Political Pulpit2008-09-13T00:00:00+00:00http://saladwithsteve.com/2008/09/amazon-as-political-pulpit.html<p>
Check out the <a href="http://www.amazon.com/review/product/B000FKBCX4/ref=dp_db_cm_cr_acr_txt?_encoding=UTF8&showViewpoints=1">Amazon reviews for Spore</a>, the new Will Wright game. 2,016 of the 2216 reviews gave it 1 star for the heavy-handed DRM. I use Amazon reviews heavily and I've never seen protest brought to the reviews at this scale before. Let's see if people take notice.
</p>
<p>
<a href="http://torrentfreak.com/spore-most-pirated-game-ever-thanks-to-drm-080913/">TorrentFreak</a> points out the following:
<blockquote>DRM doesn’t stop people from pirating a game, on the contrary. It only hurts legitimate customers since the DRM is removed from the pirate version."</blockquote>
</p>
<p>There are lots of philosophical reasons not to buy things with DRM. For me, the practical reason wins out. I'd rather deal with having to store music CDs and not play some potentially awesome games than deal with losing my data due to short-sighted DRM.
</p>
Steve Jensonhttp://saladwithsteve.com/about.htmlAlex's Scala Talk at C4[2]2008-09-12T00:00:00+00:00http://saladwithsteve.com/2008/09/alexs-scala-talk-at-c42.html<a href="http://al3x.net/">al3x</a> gave a <a href="http://www.slideshare.net/al3x/why-scala-presentation/">talk on Scala</a> recently at <a href="http://rentzsch.com/c4/twoOpen/">C4[2]</a> and it hits a lot of the high points as to why we're using Scala at Twitter to build back-end services. I've been programming in Scala seriously for about a year and a half now and it's positively ruined me for plain Java.
Steve Jensonhttp://saladwithsteve.com/about.htmlSo I don't forget it.2008-07-30T00:00:00+00:00http://saladwithsteve.com/2008/07/so-i-dont-forget-it.htmlA cheesy generic tcp proxy I found cruising the webs built out of <code>netcat</code>, <code>fifo</code>s, and <code>tee</code>:
<p>
<code>
$ mkfifo backpipe<br />
$ sudo nc -l 80 0<backpipe | tee -a inflow | nc localhost 8080| tee -a outflow 1>backpipe
</code>
</p>
This way you can also look at <code>inflow</code> and <code>outflow</code> to see what the actual contents were of the transaction.
Steve Jensonhttp://saladwithsteve.com/about.htmlSimulating Byzantine failure with SIGSTOP2008-06-30T00:00:00+00:00http://saladwithsteve.com/2008/06/simulating-byzantine-failure-with.html<p>
If your service relies on connecting to an internal network server and
that server isn't accepting connections, your client will obviously throw an
error. This happens often enough that you probably already check for this and do the right thing in your various projects. But what if the server is accepting connections but never
returning any data? This failure case is rare but very deadly. Chet
mentioned that you could simulate this using <tt>SIGSTOP</tt> so I
decided to whip up an experiment with <tt>memcached</tt> as my victim.
</p>
<p>
<pre>
stevej@t42p:~$ ps auxww |grep memcache
stevej 3451 0.0 0.0 2928 1872 pts/0 T 01:21 0:00 memcached -vv -p 11211
stevej@t42p:~$ kill -stop 3451
</pre>
</p>
<p>
In another terminal:
</p>
<p>
<pre>
stevej@t42p:~$ irb
irb(main):001:0> require 'rubygems'
=> true
irb(main):002:0> require 'memcache'
=> true
irb(main):003:0> CACHE = MemCache.new "localhost:11211"
=> <MemCache: 1 servers, 1 buckets, ns: nil, ro: false>
irb(main):004:0> CACHE.get("foo")
</pre>
</p>
<p>
The client library happily hung for several hours while I did other
things. How can a process that's suspended not timeout incoming
connections? Well, it's the kernel that services network requests and
the process itself is only reading the buffers. If you want proof,
look at this tcpdump output. Remember, the process has already been
suspended by the time I ran tcpdump here.
</p>
<p>
<pre>
stevej@t42p:~$ sudo tcpdump -i lo port 11211
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
18:02:40.576255 IP localhost.48124 > localhost.11211: F 2018798159:2018798159(0) ack 2012359105 win 257 <nop,nop,timestamp 15455978 14281170>
18:02:40.577037 IP localhost.11211 > localhost.48124: . ack 1 win 256 <nop,nop,timestamp 15455979 15455978>
18:03:19.037410 IP localhost.35662 > localhost.11211: S 2731273926:2731273926(0) win 32792 <mss 16396,sackOK,timestamp 15465593 0,nop,wscale 7>
18:03:19.037435 IP localhost.11211 > localhost.35662: S 2723119696:2723119696(0) ack 2731273927 win 32768 <mss 16396,sackOK,timestamp 15465593 15465593,nop,wscale 7>
18:03:19.037449 IP localhost.35662 > localhost.11211: . ack 1 win 257 <nop,nop,timestamp 15465593 15465593>
18:03:19.037768 IP localhost.35662 > localhost.11211: P 1:10(9) ack 1 win 257 <nop,nop,timestamp 15465593 15465593>
18:03:19.037776 IP localhost.11211 > localhost.35662: . ack 10 win 256 <nop,nop,timestamp 15465593 15465593>
</pre>
</p>
<p>
So a connect timeout wouldn't help here, you need a recv timeout or
something else. Restarting your client process won't help at all,
it'll simply get stuck in the same place. In Ruby, the easiest thing
to do is to use the Timeout module. Sadly, it only has second
granularity but that's a lot better than hanging for several
hours. You can also set use Socket#setsockopt with a recv timeout if
you need finer grained timeout resolution.
</p>
<p>
<pre>
stevej@t42p:~$ irb
irb(main):001:0> require 'rubygems'
=> true
irb(main):002:0> require 'memcache'
=> true
irb(main):003:0> CACHE = MemCache.new "localhost:11211"
=> <MemCache: 1 servers, 1 buckets, ns: nil, ro: false>
irb(main):004:0> require 'timeout'
=> false
irb(main):005:0> foo = Timeout::timeout(1) do
irb(main):006:1* CACHE.get("foo")
irb(main):007:1> end
/usr/lib/ruby/1.8/timeout.rb:54:in `cache_get': execution expired (Timeout::Error)
from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.5.0/lib/memcache.rb:209:in `get'
from (irb):6:in `irb_binding'
from /usr/lib/ruby/1.8/timeout.rb:56:in `timeout'
from (irb):5:in `irb_binding'
from /usr/lib/ruby/1.8/irb/workspace.rb:52:in `irb_binding'
from /usr/lib/ruby/1.8/irb/workspace.rb:52
</pre>
</p>
<p>
Do you want to guess what happened when I sent <tt>SIGCONT</tt> to memcache? My client processes, even the ones that had been hanging for hours, immediately returned with the expected data.
</p>
<p>
The obvious thing to do is to write a new MemCache subclass decorating
all the calls to <tt>get</tt>, <tt>put</tt>, <tt>get_multi</tt>, etc
with safer versions. Don't naively trust that the expected data made
it to the cache.
</p>
Steve Jensonhttp://saladwithsteve.com/about.htmlMore on Twitter!2008-05-22T00:00:00+00:00http://saladwithsteve.com/2008/05/more-on-twitter.htmlAl3x wrote up a nice blog post <a href="http://dev.twitter.com/2008/05/twittering-about-architecture.html">talking about the future of twitter.</a>
Steve Jensonhttp://saladwithsteve.com/about.htmlHow work is going?2008-05-22T00:00:00+00:00http://saladwithsteve.com/2008/05/how-works-going.html<p>
Since I started working at Twitter last month, I put up a standard work disclaimer along the side. It <strong>always</strong> applies.
</p>
<p>
Jack posted on the company blog: <a href="http://blog.twitter.com/2008/05/i-have-this-graph-up-on-my-screen-all.html">I have this graph up on my screen all the time. It should be flat. This week has been rough.</a>
</p>
<p>
So we have open job postings for something called a <a href="http://twitter.com/help/jobs#syseng">Systems Engineer</a>, which is what I do at Twitter. Systems Engineering means building systems where graphs like that stay flat and where downtime means it was either planned or making sure that particular problem won't happen again (if it can be avoided: typical engineering trade-offs apply).
</p>
<p>
Our problems are really interesting, I think. Lots of users, lots of connections, lots of messages flowing through the system, lots of endpoints, and lots of details to keep straight. All of this needs to be turned into a cohesive system that's simple to reason about and to run in order for me to consider my job a success. It's a tall order but it's what I signed up to do. I've been watching Twitter for a long time (I'm user #150) so I walked into things with my eyes wide open.
</p>
<p>
If you've been reading this blog for a while, you know that I'm more interested in engineering than hacking together a site. Thinking and then doing. Measuring and then reasoning. Making guesses and then testing them. There's a natural tension between cowboying around and Analysis Paralysis and you have to learn to walk that tightrope if you want to succeed and I think at Twitter, we work pretty hard to Do the Right Thing.
</p>
<p>
I'm writing this quick post because we're looking for great people who are interested in engineering big systems and in helping to make Twitter the utility-class company we see ourselves as needing to be. If you think you either have the skills or can learn them, please send us your resume to jobs@twitter.com.
</p>
Steve Jensonhttp://saladwithsteve.com/about.htmlFun DTrace script2008-05-20T00:00:00+00:00http://saladwithsteve.com/2008/05/fun-dtrace-script.html<pre>
<code>
#!/usr/sbin/dtrace -s
syscall:::entry
/pid == $1/
{
@sys[probefunc, ustack()] = count();
}
END {
trunc(@sys, 2);
}
</code>
</pre>
Tells you the 2 most often called system call/stack trace pair. Running it against firefox 3 beta while using Google Reader shows:
<pre>
<code>
$ sudo ./syscalldist.d 240
dtrace: script './syscalldist.d' matched 428 probes
^C
CPU ID FUNCTION:NAME
1 2 :END
munmap
libSystem.B.dylib`munmap$UNIX2003+0xa
libSystem.B.dylib`free+0x6a
CoreGraphics`CGEventCreateFromDataAndSource+0xbce
CoreGraphics`CGSDecodeEventRecord+0x6a
CoreGraphics`CGSDispatchDatagramsFromStream+0x28f
CoreGraphics`snarfEvents+0x12a
CoreGraphics`CGSGetNextEventRecordInternal+0x9f
CoreGraphics`CGEventCreateNextEvent+0x2c
HIToolbox`PullEventsFromWindowServerOnConnection(unsigned int, unsigned char)+0x58
CoreFoundation`__CFMachPortPerform+0x75
CoreFoundation`CFRunLoopRunSpecific+0xf51
CoreFoundation`CFRunLoopRunInMode+0x58
HIToolbox`RunCurrentEventLoopInMode+0x11b
HIToolbox`ReceiveNextEventCommon+0x176
HIToolbox`BlockUntilNextEventMatchingListInMode+0x6a
AppKit`_DPSNextEvent+0x291
AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]+0x80
AppKit`-[NSApplication run]+0x31b
XUL`JSD_GetValueForObject+0xad6ce
XUL`XRE_GetFileFromPath+0x61c563
961
mmap
libSystem.B.dylib`mmap+0xa
libSystem.B.dylib`large_and_huge_malloc+0xcb
libSystem.B.dylib`szone_malloc+0x1cf
libSystem.B.dylib`malloc_zone_malloc+0x51
libSystem.B.dylib`malloc+0x37
CoreGraphics`CGEventCreateFromDataAndSource+0x15e
CoreGraphics`CGSDecodeEventRecord+0x6a
CoreGraphics`CGSDispatchDatagramsFromStream+0x28f
CoreGraphics`snarfEvents+0x12a
CoreGraphics`CGSGetNextEventRecordInternal+0x9f
CoreGraphics`CGEventCreateNextEvent+0x2c
HIToolbox`PullEventsFromWindowServerOnConnection(unsigned int, unsigned char)+0x58
CoreFoundation`__CFMachPortPerform+0x75
CoreFoundation`CFRunLoopRunSpecific+0xf51
CoreFoundation`CFRunLoopRunInMode+0x58
HIToolbox`RunCurrentEventLoopInMode+0x11b
HIToolbox`ReceiveNextEventCommon+0x176
HIToolbox`BlockUntilNextEventMatchingListInMode+0x6a
AppKit`_DPSNextEvent+0x291
AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]+0x80
997
</pre>
</code>
Thrilling, I know!
Steve Jensonhttp://saladwithsteve.com/about.htmlDTrace for Java 6 on Leopard2008-05-19T00:00:00+00:00http://saladwithsteve.com/2008/05/dtrace-for-java-6-on-leopard.html<p>
When Java 6 for Leopard was released a few weeks ago, one thing that nobody seemed to notice was that Java now had DTrace probes on par with Java on Solaris.
</p>
<p>
What you expect is there:
<ul>
<li><a href="http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Java">DTrace Topics: Java</a></li>
<li><a href="http://java.sun.com/javase/6/docs/technotes/guides/vm/dtrace.html">DTrace Probes in HotSpot VM</a></li>
</ul>
</p>
<p>
With one exception: <code>jstack</code> doesn't appear to work. <code>ustack</code> works fine.
</p>
<p>
<pre>
<code>
$ sudo dtrace -x jstackstrsize=2048 -n 'syscall::read:entry /execname == "java"/ { jstack(); }'
dtrace: description 'syscall::read:entry ' matched 1 probe
CPU ID FUNCTION:NAME
3 17600 read:entry
2 17600 read:entry
3 17600 read:entry
3 17600 read:entry
2 17600 read:entry
2 17600 read:entry
2 17600 read:entry
2 17600 read:entry
</code>
</pre>
</p>
<p>
There should be java stack traces under each read:entry line. (This is true even with <code>-XX:+ExtendedDTraceProbes</code> enabled)
</p>
<p>
I used robey's scarling for my guinea pig and had a lot of fun poking around at it with dtrace.
</p>
Steve Jensonhttp://saladwithsteve.com/about.htmlJohn McCarthy has a good sense of humor2008-05-11T00:00:00+00:00http://saladwithsteve.com/2008/05/john-mccarthy-has-good-sense-of-humor.htmlFrom an informal talk he gave at Stanford recently that was written up in <a href="http://news.ycombinator.com/item?id=185348">Hacker News</a>:
<blockquote>
<pre>
Q. Can computers know?
A. This is largely a question of definition. If a camera looked at a table, we could
say it "knows" that there are four containers of liquid on the table (which was true).
Q. Is there any definition of "know" in which computers cannot succeed?
A. Well, I suppose the biblical sense.
Q. Ha, well, what makes you think that?
A. They don't satisfy the necessary axioms (laughter)
</pre>
</blockquote>
Steve Jensonhttp://saladwithsteve.com/about.htmlWhat are you doing?2008-04-22T00:00:00+00:00http://saladwithsteve.com/2008/04/what-are-you-doing.htmlreading @<a href="http://blog.twitter.com/2008/04/welcome-john-and-steve.html">biz</a> out me as a Twitter employee.
Steve Jensonhttp://saladwithsteve.com/about.htmlcurious what delicious is saying about something?2008-04-15T00:00:00+00:00http://saladwithsteve.com/2008/04/curious-what-delicious-is-saying-about.html<a href="javascript:location.href='http://del.icio.us/url/check?&url='+encodeURIComponent(location.href)">Here's a bookmarklet that checks the current URL you're visiting with del.icio.us</a>.
Steve Jensonhttp://saladwithsteve.com/about.htmlMy first Thrift app2008-04-13T00:00:00+00:00http://saladwithsteve.com/2008/04/my-first-thrift-app.html<p>
When you find yourself working on big systems, a useful technique is
to decompose it into services. Moving from a big monolithic server to
a bunch of separate services can be a big challenge but if you had
foresight, many of your services were already decoupled in your system
from day 1 even though you were deploying it monolithicly.
</p>
<p>
A common technique for decomposing services is using RPC. At Google, we
used protocol buffers, which were briefly descibed in the <a href="http://labs.google.com/papers/sawzall-sciprog.pdf">Sawzall</a>
paper.
</p>
<p>
Basically, you describe your data and the interface that process the
data in a language-independent format (a DDL, essentially) and use code
generators to turn that DDL into set of objects in your target
langauge that can create and send those structures over the wire. This
makes it easy to write servers in one language and clients in another
and the generated code deals with serialization.
</p>
<p>
I found that using a DDL to describe your code and services was really
nice. When building a new service, you could simply reference your DDL
in the design doc and have a meanginful discussion about the service
without getting into the details of how it would be written until you had
the semantics nailed down.
</p>
<p>
Facebook, as they were growing, decided to move to a homegrown binary
RPC mechanism similar to protocol buffers called <a href="http://developers.facebook.com/thrift/">Thrift</a>.
</p>
<p>
Let's say I wanted to write a simple service that would tell the
client what time it was on the server. Here would be the DDL file
describing both the data and the service plus a little extra to help
out the generated code files.
</p>
<pre name="code" class="thrift">
# time.thrift
namespace java tserver.gen
namespace ruby TServer.Gen
typedef i64 Timestamp
service TimeServer {
// Simply returns the current time.
Timestamp time()
}
</pre>
<p>
After running <tt>thrift --gen java --gen rb time.thrift</tt> on the
file, I'd have an interface and server that I could implement in Java
and a client that I could use in Ruby.
</p>
<p>
Based on the generated java code, I could write a short server in Scala:
</p>
<pre name="code" class="scala">
<code>
package tserver
import tserver.gen._
import com.facebook.thrift.TException
import com.facebook.thrift.TProcessor
import com.facebook.thrift.TProcessorFactory
import com.facebook.thrift.protocol.TProtocol
import com.facebook.thrift.protocol.TProtocolFactory
import com.facebook.thrift.transport.TServerTransport
import com.facebook.thrift.transport.TServerSocket
import com.facebook.thrift.transport.TTransport
import com.facebook.thrift.transport.TTransportFactory
import com.facebook.thrift.transport.TTransportException
import com.facebook.thrift.server.TServer
import com.facebook.thrift.server.TThreadPoolServer
import com.facebook.thrift.protocol.TBinaryProtocol
/**
* TimeServer.time returns the current time according to the server.
*/
class TimeServer extends TimeServer.Iface {
override def time: Long = {
val now = System.currentTimeMillis
println("somebody just asked me what time it is: " + now)
now
}
}
object SimpleServer extends Application {
try {
val serverTransport = new TServerSocket(7911)
val processor = new TimeServer.Processor(new TimeServer())
val protFactory = new TBinaryProtocol.Factory(true, true)
val server = new TThreadPoolServer(processor, serverTransport,
protFactory)
println("starting server")
server.serve();
} catch {
case x: Exception => x.printStackTrace();
}
}
</code>
</pre>
<p>
(Geez, most of that space was taken up in my obsessive need to separate
out all my imports. You can thank Google for that bit of OCD.)
</p>
<p>
The client is even shorter:
</p>
<pre name="code" class="ruby">
<code>
#!/usr/bin/ruby
$:.push('~/thrift/lib/rb/lib')
$:.push('../gen-rb')
require 'thrift/transport/tsocket'
require 'thrift/protocol/tbinaryprotocol'
require 'TimeServer'
transport = TBufferedTransport.new(TSocket.new("localhost", 7911))
protocol = TBinaryProtocol.new(transport)
client = TimeServer::Client.new(protocol)
transport.open()
puts "I wonder what time it is. Let's ask!"
puts client.time()
</code>
</pre>
<p>
The ruby client took about 20ms to get an answer from the Scala server.
</p>
<p>
Thrift advantages:<br />
<ul>
<li>Pipelined connections means you spend less time in connection setup/teardown and TCP likes longer-lived connections.</li>
<li>Asynchronous requests. Asynchronous replies would be nice too but would be trickier to use.</li>
<li>Binary representation is much more efficient to transmit and process than, say, XML.</li>
</ul>
</p>
<p>
Thrift drawbacks: <br />
<ul>
<li>Integrating generated source into your build system can be tricky.
Typically, you rarely have to regenerate your stubs but debugging generated code can be a huge pain.</li>
<li>It's Java server should move away from ServerSocket to NIO for increased throughput.
That's probably not more than a week's work as long as the existing code isn't too tightly coupled.</li>
<li>Currently it doesn't build cleanly on the Mac. I did some work and got it working but I don't think it's used
extensively on the Mac so if that's your primary platform, you should be prepared to send them patches from time to time.</li>
</ul>
</p>
<p>
If you're looking to move towards decoupled services, Thrift is worth a hard look.
</p>
<p>
Here's a tarball with my <a href="http://saladwithsteve.com//code/timeserver.tar.gz">time
server</a>. It contains all the generated code as well as
libthrift.jar and a Makefile to run the example server.
</p>
Steve Jensonhttp://saladwithsteve.com/about.htmlGVN and gold2008-04-07T00:00:00+00:00http://saladwithsteve.com/2008/04/gvn-and-gold.html<p>
Two things popped up on my radar recently:
</p>
<p>
<a href="http://code.google.com/p/gvn/">gvn</a>, Google's wrappers around Subversion to help them work in their code-review heavy workflow. Even if you're not into code reviews, <code>tkdiff</code> integration is a nice improvement over <code>colordiff</code> or <code>FileMerge</code>.
</p>
<p>
<a href="http://www.airs.com/blog/archives/164">gold</a>, a new ELF linker built with giant binaries in mind. When you're building 900MB+ static binaries routinely, linking speed matters. gold claims to be at least 5x faster currently. Even if you have a massive distcc cluster, linking is still serial. One of gold's future design goals is to be concurrent and that would be pretty awesome. Imagine how fast I could link with a concurrent linker on my 8-core Mac Pro! Not that using an ELF linker under Leopard helps much since OS X uses Mach-O binaries but hey, there's always cross-compiling.
</p>
<p>
BTW, Ian Lance Taylor, the author of gold, has <a href="http://www.airs.com/blog/archives/38">an excellent series of blog articles on linkers.</a>
</p>
Steve Jensonhttp://saladwithsteve.com/about.html