Scaling data on the cheap
Don't shard by hash because when you have to reshard, you have to move ALL of your data. Ideally, you can add shards without rebalancing.
Use bloom filters to help you know which shard your data is contained in without hitting the shard. That technique is also used in BigTable (see Section 6 of the recently published BigTable paper).
# — 10 September, 2006