Blog

Waldemar Kornewald on April 20, 2010

NoSQL panel on DjangoDose

DjangoDose has published a "callcast" about NoSQL support in Django. It's pretty interesting to learn about the different views on how far support for non-relational (NoSQL) DBs should go. Alex Gaynor wants to work on NoSQL backend support during Google Summer of Code and if he gets accepted (which is very likely) we might get official non-relational DB support in Django 1.3!

Let me quickly summarize the most important topics that came up:

Some non-relational DBs only support primary key queries (i.e., you can't query by an attribute). The way developers work around this is that they manually maintain extra "index" tables where the primary key is the attribute value that you want to filter against (e.g., username) and another attribute in that table stores the corresponding primary key (e.g., user id). For example, an entry for the username index would look like this: {'pk': 'wkornewald', 'user_id': 1} One of the big questions was: Is it enough for Django to support pk-only queries or should Django (or some separate module) take care of maintaining indexes for you?

Is the admin useful if you can only query by pk? Is it sufficient if you can only browse through your database (by page) and access an entity directly by pk, but not search by any other attribute? A related question was: how will you even login with the username (or email) if you can only query by user id?

Should we also automate more complex indexes for features like JOINs? They could be emulated via in-memory JOINs and denormalization, for example. Wouldn't this make it difficult to predict which queries are efficient and which are not? Also, are the automatically maintained indexes too difficult to understand? E.g., will the user understand what all the extra tables and extra attributes in the models are? (The following didn't come up, but I'd like to add it now: Will those indexes look that much different from hand-written code, at all? Wouldn't someone who knows how to work at the low level then automatically know what kind of index is being created by the backend and also which queries can be efficient?)

So, listen to the callcast to get the whole picture and add your opinion to the NoSQL discussion on the Django discussion group.