Tuesday, January 24, 2012

Have you rolled your own NoSQL DBMS, yet?


Remember those days when you used to buy some silicon chips, make your own computer, and sell the roll-your-own-computer kit to other geeks? I don't remember them either, but that's apparently how Apple was born.

Move on commodotized-computers, the era of rolling your own DBMS (database management system) has arrived.



What got us here? 


Let's say that you are building a web-scale T-shirt selling application but the relational data model is too constraining with its normalization (sorry Codd) and SQL is an overkill (by the way, isn't it ironical that olde SQL was an overkill but couldn't do transitive closure for listing your network on Facebook?


You can choose from CouchDB, MongoDB, Cassandra, Dynamo, HBase, etc. Chances are that even they don't fit your unique needs for storing BSON objects (oh wait MongoDB does that), wiki-like docs (CouchDB), simple key-value pairs (DHTs, Hadoop, Cassandra), low latency (membase), Big Data Analytics (Pig, Hive, InfoBright).. OK I got it.. your app gathers tons of geo-spatial-music-video transient data with 66% Writes and needs exactly 99.4% availability? Well now you can build your own DBMS on the cloud using soft-state, eventual consistency models (who cares, it's just web data anyway ;-) ), make it scale on Hadoop, and there you are - a rebel is born! Of course, this will now lead to dozens of questions on Quora.com about how YourDB compares with other DBs, which only adds to the aura.


I love this freedom to create! I too once hacked together my own DBMS (myDB) just to store the list of books I had read. Why not use MySQL?
  1. Books can have more than one author (!) and it felt convoluted to normalize.
  2. This may sound paranoid but I was worried that if the DBMS messed up, I would have no way of retrieving my hand-typed data. I wanted know how the bytes were created. 
  3. I wanted simple queries. So I wrote a mini SQL query processor. (Aside: this personal DBMS became the basis for a cool data analytics project at Bell Labs called AQUA- The approximate query answering system, check out its Web 1.0 page.)

My point is that there's a liberating sense of control (oxymoron alert!) in having your own DBMS and consciously flouting a few RDBMS laws along the way. ACID is too much for you? let's try BASE. Locking is slowing your down? Let's drop locks and create new versions of data. Consistency is a drag? Let's relax it to some form of lazy eventual consistency. Starting with Eric Brewer's CAP Theorem, we have realized that you have to give up something to get two other things in distributed systems and life.

And once you have hand-crafted the optimal DBMS, why keep it hidden as a hack?! Gone are the days where the RDBMS purists would scoff at you for violating Boyce-Codd normal form and not honoring all constructs of SQL. Your baby now belongs to the rebellious NoSQL family!

Welcome to the brave new world of hand-crafted laissez-faire databases!

But here's the kicker: The more I read about the NoSQL databases, the more instances I'm coming across of big web companies reverting back to good old mysql for some uses (Google, Facebook, Twitter). Clearly RDBMS haven't gone the way of dinosaurs. So, how do I pick the right horse for my course? (ed note: pathetic mix of metaphors)


Next in this series: Navigating the NoSQL Tower of Babel: Choosing the right database for your application

thanks, vishy
innovate & impact

1 comment:

  1. Hi, Nice tips about DBMS and MySQL.Thanks, its really helped me......

    -Aparna
    Theosoft

    ReplyDelete