Blog

Waldemar Kornewald on September 02, 2010

Get SQL features on NoSQL with django-dbindexer

With the just released django-dbindexer you can use all Django lookup types ("iexact", "month", "day", etc.) on non-relational DBs like App Engine and MongoDB, even if they're not natively supported by that DB! Also, django-dbindexer can help you with optimizing your code. For instance, case-insensitive filters on MongoDB can't use indexes, so they're not very efficient. With django-dbindexer they can be handled as efficient case-sensitive filters which utilize an index. Sounds too good to be true? Keep reading.

Non-relational databases have rather limited query APIs. For example, on App Engine you can't do a case-insensitive "iexact" match. Developers on these platforms have to use ugly workarounds to implement unsupported filters. Poor guys. ;) This is how you'd emulate "iexact" and "month" on App Engine (using djangoappengine):

# models.py:

class MyModel(models.Model):
    name = models.CharField(max_length=64)
    lowercase_name = models.CharField(max_length=64, editable=False)
    last_modified = models.DateTimeField(auto_now=True)
    month_last_modified = models.IntegerField(editable=False)

    def save(self, *args, **kwargs):
        self.lowercase_name = self.name.lower()
        self.month_last_modified = self.last_modified.month
        super(MyModel, self).save(*args, **kwargs)

def run_query(name, month):
    MyModel.objects.filter(lowercase_name=name.lower(),
                           month_last_modified=month)

This has several problems:

The model above had merely two fields: "name" and "last_modified". It's easy to imagine that in larger projects the models can become a real mess because of all the workarounds. This is just the wrong way to write DB code. Can't we automate this? Yes, we can! Enter django-dbindexer. It allows you to specify index definitions separately from the model, similar to the "index.yaml" file on App Engine and the djangoappengine index definitions feature. Let's see what the example from above looks like with the indexer:

# models.py:

class MyModel(models.Model):
    name = models.CharField(max_length=64)
    last_modified = models.DateTimeField(auto_now=True)

def run_query(name, month):
    MyModel.objects.filter(name__iexact=name, last_modified__month=month)

Looks exactly like with SQL. Nice, isn't it? All you need to make this work is this index definition:

# dbindexes.py:

from models import MyModel
from dbindexer.api import register_index
register_index(MyModel, {'name': 'iexact', 'last_modified': 'month'})

With this little index definition we solve all of the problems mentioned above. Many Django apps like django-registration can be brought to life on non-relational DBs without any modifications to their source. Also, the authors of reusable Django apps can add a simple "index.py" file to their project to make sure that NoSQL developers can use it out-of-the-box. At the same time the code will continue to work on SQL DBs. Even projects that target only non-relational DBs can benefit from the indexer because their code becomes simpler and portable across many NoSQL DBs. The indexer allows the different NoSQL communities to share the same code, sometimes even with the SQL community!

Moreover, the index definition makes the "iexact" filter work efficiently on MongoDB because it converts the string to lowercase, so the "iexact" filter becomes an "exact" filter. This way it can be executed efficiently by using a MongoDB index.

You can also define multiple filters on a single field by passing a tuple of filters:

register_index(MyModel, {'name': ('iexact', 'istartswith', ...)})

Finally, if you want to see a complete example you can download the sample project for App Engine.

With great power comes responsibility ;)

If you're new to NoSQL all this might sound like the Holy Grail that can magically solve all problems. Keep in mind that this is primarily a utility to make advanced developers more productive. You have to understand what's happening internally, so you can make wise design decisions. The internal implementation details of every filter are documented in the reference.

For example, the "contains" filter stores a ListField with all substrings of the indexed field. When querying, it uses "startswith" on that ListField. On App Engine startswith gets converted to ">=" and "<" filters.

Installation

This might sound strange, but the indexer is implemented as a database backend which proxies your actual database backend. For example, if you're on App Engine you have to specify your App Engine backend (here: "gae") and the indexer (here "default") and then you tell the indexer via 'TARGET' which backend should be indexed:

# settings.py:

DATABASES = {
    'default': {
        'ENGINE': 'dbindexer',
        'TARGET': 'gae',
    },
    'gae': {
        'ENGINE': 'djangoappengine.db',
    },
}

MIDDLWARE_CLASSES = (
    # This has to come first
    'autoload.middleware.AutoloadMiddleware',
    ...
)

INSTALLED_APPS = (
    ...
    'autoload',
    'dbindexer',
)

Note that the settings.py in django-testapp already auto-detects and configures the dbindexer somewhere at the bottom of settings.py, so you might not need to change your settings. The middleware has to come first because it must load all index definitions before any other code tries to interact with the database. For more information on auto-loading of specified modules see django-autoload.

Next we have to define a site configuration module which loads the required index definitions. Let's put it in a dbindexes.py in our project root folder. First, we add it to settings.py:

# settings.py:
AUTOLOAD_SITECONF = 'dbindexes'

The actual site configuration module looks like this:

# dbindexes.py:
from dbindexer import autodiscover
autodiscover()

The autodiscover() function will load all modules named dbindexes.py from your settings.INSTALLED_APPS, so they can register their indexes. Alternatively, you can just import the desired index definition modules directly without calling autodiscover().

The future

This is just the beginning. At the moment the indexer "only" adds support for all Django filters, but the plan is to also add support for JOINs by automatically denormalizing your models. We could also support aggregates and distinct() and values() and nested queries and much much more. It won't be 100% all of SQL, but we want to get really close as long as it can be done in a scalable way. This will allow you to write complex queries in a few minutes instead of hours (including unit tests and debugging ;). No more hand-written denormalization and map/reduce and aggregate code. Just tell the indexer what you want to do and it'll handle it for you. Most importantly, it'll do this in a scalable way and internally it'll use the DB exactly like your hand-written code would (or even better ;). The first step is done. If you want to help with the next phase you can drop us a mail: http://groups.google.com/group/django-non-relational.