False Continuums
People like to categorize things into continuums, I've noticed. Take databases: embedded (Berekely DB, Mnesia, Metakit, SQLite) vs. distributed (SQL Server, Kdb+, Matisse, FramerD). This distinction exists only because the authors have tended to be specialized in those ways, not because there is a fundamental reason that embedded and distributed can't be treated as axes and both implemented in a product.
In a moment we'll discuss what that would be like but first let's talk about the present: SQL + Java. In this world, mismatches rule: you access your data through specific patterns (Active Record, so masterfully woven into Rails comes to mind), use application-agnostic (Memcached, a poor man's Linda) and application-specific caching (everybody who's written a buggy LRU implementation to cache something in an app you've written, please raise your hand) layers between your massively expensive relational databases and your cheap appservers. You rely on a thin, flimsy pipe between the host language and your data, drinking the ocean through a leaky straw. Ever have to recompile your Java app because you missed a semi-colon or a parenthesis in your SQL? Ever wonder at night if your code is vulnerable to a SQL injection attack? This is a result of the mismatch I'm talking about.
Object databases like ZODB and embedded databases like Mnesia offer a unified type system with the host language but in doing so are tied to that specific language (Python and Erlang respectively in this case) and in some cases to a specific machine.
I want a database where I can store my objects on the network, with locality defined appropriately to the data's usage, (in my main memory if I need it or stored on a disk on a random machine somewhere otherwise: why do you think God invented the memory heirarchy? that reminds me: the network should replace the tape drive in the traditional memory heirarchy chart), with a unified type system so I can use the data structures already available to me: dictionaries in Python, structs/objects in Common Lisp, Collections in Java: all simulatenously on the same database. I want to be able to define query mechanims appropriate to my problem space and to work over the data with similarly appropriate means: whether through prolog-style backtracking, declarative statements like in SQL, vectors like in K, dataflow variables like in Mozart or E, frames in FramerD, Iterators in Java/C++, or even XQuery or RDF, it should be my choice since it's my problem. I should be allowed to make the tradeoffs since it's my butt on the line.
This is totally buildable. You can write the prototype in Lisp where you would already have a unified type system (code is data and all that jazz), easy ways to define new query mechanisms (hint: just use macros), fast I/O (every modern common lisp implementation is compiled native), and competitors too scared to follow you into battle (Blub paradox and what-not).
I guess you can tell what I spent my one-day vacation working on. I wish I had more to show than just a buggy B+-tree written in Lisp.


![[Atom Enabled]](http://saladwithsteve.com/valid-atom.png)
1 Comments:
WRT memory hierarchy. The distributed flat filesystem that I'm working on supports an LRU interface with two implementations. One is local (just an in-memory cache) and the other is remote (memcached or whatever you want).
This way you get the best of both worlds as you can get much better performance for in-memory data as you can for remote data.
So you basically have four hierarchies:
local in-memory
local on-disk
remote in-memory
remote on-disk
There could be a slight duplication between local in-memory and remote in-memory if you configure a node to run both memcached and a storage node but there's no solid way to connect to memcached via shared memory right now (which would be SWEET).
Of course implementing a memcached in Java wouldn't be too hard.
I'm getting ahead of myself though.
4:20 PM
Post a Comment
Links to this post:
Create a Link
<< Home