enjoying salad since 1978.

Saturday, July 28, 2007

Oh no, Michael Vick!

Jonathan Lee Riches© is stepping up to the plate. What will he do with 63 billion billion dollars?

Thursday, July 26, 2007

using bzip2 with GNU tar

Because I can never remember the 'j':

tar jxvf archivefilename.tar.bz2

Also, this is my first post using e-blog.

Sunday, July 22, 2007

Wacky people are great for killing time.

Check out the warning on this ebay listing for various old tchotchkes. Warning: long, rambley, somewhat cat-obsessed craziness lurks behind that link.

[via crazybob.]

I guess RAID wasn't the answer for them.

Johnathon Shapiro working on Coyotos wrote

"The marketing droids would have you believe that RAID is an answer to all of your storage wrongs. Bullshit. The most common cause of drive failure in modern enclosures is heat. When multiple drives share an enclosure, they tend to fail about the same time. We experienced this on Friday night, when two drives died within about 3 hours late one night."

Thankfully, they were making incremental backups to a separate server via rsync. Read the link above for the details.

Saturday, July 21, 2007

Ship it, Paul.

Paul Graham on Arc Lessons:
"We want Arc to be good for writing programs, and one way to ensure that is to start writing programs while the language is still malleable. In the process we've learned a few lessons.

Then doesn't it stand to reason that putting up a pre-alpha tarball or giving read-only access to a source repo would allow you to learn more lessons about your fledgling language.

I can see two problems that Graham might be trying to avoid:

1) Arc is bound to aggravate some Lispers with whatever choices are made in the final syntax. It's an impossible group to please. Whatever arcane decisions he reverses will be inevitably disliked by some subset of the people paying attention. Perhaps he hopes to avoid this by not showing off the language too early while it's still "malleable" enough that the unwashed masses think they have a shot of getting their favorite cruft shoved back into the language via noisy publicity campaigns. Think ugly language wars on Usenet meets Technorati.

2) Perhaps he thinks that him and his friends are the ones best suited to determine what should be in Arc and just isn't that interested in what the rest of us plebes think. That would tell us a lot about what kind of governance Arc will have: authoritarian. Think benevolent dictator for life without the benevolence.

Of course, neither of these could be the reason. It's hard to divine from the tea leaves and goat entrails left in a few web posts over the course of six (!!) years. That said, I'm dying to give it a spin.

Thursday, July 19, 2007

Rock me, Amadeus

Stacy and I laugh and laugh:

Monday, July 16, 2007

Won't have Nixon to kick around any more.

Is it irony that Nixon thought a well-executed PR plan would change public opinion about him being an "efficient, crafty, cold, machine"?

This article from Slate is going to be the first of many fun trips down Nixonian Lane now that the Nixon Foundation has freed his remaining documents.

[via goldtoe who points out that our current president's memos might never see the light of day.]

Sunday, July 15, 2007

My awesome cheap pen.

This instructable gives you the writing quality of an expensive mont blanc pen without the pretentiousness. I skipped the step where you trim back the insert and instead simply didn't close the pen all the way. I was worried the pen would wobble but it was just fine.

Monday, July 09, 2007

I've been getting up early lately. Where early means before 8:30. It's leaving me on edge. I might do something crazy. Stacy took this home video of me. The music was not dubbed, I promise. It was just playing on the stereo. Unfortunate.

Friday, July 06, 2007

Disk vs. RAM. Round 1.

Kevin says:
I think more and more scalable compute infrastructures are going to cheat and ditch disk (or memory buffered disk) in favor of all in-memory data structures or SSD.

Putting all of your data in memory is not cheating, it's just another trade off.

This is not a direct rebuttal to Kevin, it's just generic advice to the random engineer who might run across this. To any managers out there: I'm oversimplifying some issues. Your engineers will know which.

Let's look at what Hennessy Patterson (4th Ed) tells us about memory access times. I've ranked them fastest to slowest.

  1. Register: 250ps
  2. L1 Cache: 1ns
  3. RAM: 100ns
  4. Disk 10ms

We used to think of memory hierarchy layers being about an order of magnitude slower than the preceeding layer but that hasn't been true for some time. A disk access at 10ms is 1000 times slower than reading from RAM. If you really care about user request latency then you don't want to be hitting disk per-request if you can afford not to.

So are you comfortable adding several hundred milliseconds for a request? You might be if you're a seriously resource constrained startup and have a very large amount of data like your typical IR system such as Kevin's does. If you have a relatively small amount of data then keeping everything in RAM is a pretty amazing trade-off.

As a medium-sized startup, what if you could pay 30 grand and see that money have an immediate impact on those pesky, painful performance graphs you keep. Wouldn't you spend that?

But it's easy to think about those dollars and tell yourself: "well, disks also store bytes, why not stick our bytes in those instead and use this money to pay for an off-site to get away from our performance troubles?" I'm kidding, people don't actually say that out loud but I've seen some eyes bug out of some sockets when looking at RAM prices. It's just natural to want to use disk for on-line storage since disks store so much data and seems so much cheaper if you don't think about the latency difference.

But let's talk concretely. I've been talking about "small" and "large" amounts of data. Let's talk about a specific scenario.

You have a popular restaurant review site. Each review is less than half the size of this blog post. That's about 2k. A page has 20 reviews but because you don't have that much RAM, you need to read those 20 reviews from disk. It takes a quarter-second to just read the reviews from disk even with that fancy-pants storage array you bought.

You're very lucky, your site has 10 million reviews. Continuing in my oversimplification, that's like 20 gigabytes. A gig of ECC FB-DIMM RAM is going for about $100/gig so that is 20k plus the extra 10k for the machines to hold your new RAM. 30k to automatically drop a quarter of a second from each request seems pretty cheap to me. Not only is your site faster but it now has more capacity since it's not wasting a quarter-second per request just moving a disk head around and reading data into memory where it should have been to begin with.

One last point: It only feels like cheating because you have less to worry about and is more expensive. Unfortunately, there's a certain class of naive engineer who will think that they can build a system without the expense and with 99.9% of the gains. (naive engineers love to throw 99.9% around!) They'll try to convince you that you just need another layer of LRU caches or some more memcached installations. They'll be a hero! They'll get a Founder's Award! Their company's VCs will take notice and promise to fund anything they start! Fire that guy. Use his salary to buy more RAM.