Steve Jenson's blog

False Continuums

False Continuums

People like to categorize things into continuums, I've noticed. Take databases: embedded (Berekely DB, Mnesia, Metakit, SQLite) vs. distributed (SQL Server, Kdb+, Matisse, FramerD). This distinction exists only because the authors have tended to be specialized in those ways, not because there is a fundamental reason that embedded and distributed can't be treated as axes and both implemented in a product.

In a moment we'll discuss what that would be like but first let's talk about the present: SQL + Java. In this world, mismatches rule: you access your data through specific patterns (Active Record, so masterfully woven into Rails comes to mind), use application-agnostic (Memcached, a poor man's Linda) and application-specific caching (everybody who's written a buggy LRU implementation to cache something in an app you've written, please raise your hand) layers between your massively expensive relational databases and your cheap appservers. You rely on a thin, flimsy pipe between the host language and your data, drinking the ocean through a leaky straw. Ever have to recompile your Java app because you missed a semi-colon or a parenthesis in your SQL? Ever wonder at night if your code is vulnerable to a SQL injection attack? This is a result of the mismatch I'm talking about.

Object databases like ZODB and embedded databases like Mnesia offer a unified type system with the host language but in doing so are tied to that specific language (Python and Erlang respectively in this case) and in some cases to a specific machine.

I want a database where I can store my objects on the network, with locality defined appropriately to the data's usage, (in my main memory if I need it or stored on a disk on a random machine somewhere otherwise: why do you think God invented the memory heirarchy? that reminds me: the network should replace the tape drive in the traditional memory heirarchy chart), with a unified type system so I can use the data structures already available to me: dictionaries in Python, structs/objects in Common Lisp, Collections in Java: all simulatenously on the same database. I want to be able to define query mechanims appropriate to my problem space and to work over the data with similarly appropriate means: whether through prolog-style backtracking, declarative statements like in SQL, vectors like in K, dataflow variables like in Mozart or E, frames in FramerD, Iterators in Java/C++, or even XQuery or RDF, it should be my choice since it's my problem. I should be allowed to make the tradeoffs since it's my butt on the line.

This is totally buildable. You can write the prototype in Lisp where you would already have a unified type system (code is data and all that jazz), easy ways to define new query mechanisms (hint: just use macros), fast I/O (every modern common lisp implementation is compiled native), and competitors too scared to follow you into battle (Blub paradox and what-not).

I guess you can tell what I spent my one-day vacation working on. I wish I had more to show than just a buggy B+-tree written in Lisp.

# — 21 August, 2005