Blog

Waldemar Kornewald on April 06, 2010

Writing a non-relational Django backend

In our April 1st post we claimed to have a simplified backend API. Well, this wasn't true, of course, but yesterday it has become true. The Django ORM is pretty complicated and it takes too much time for contributors to understand all the necessary details. In order to make the process as easy as possible we've implemented a backend template which provides a simple starting point for a new backend based on our simplified API. It also contains sample code, so you can better understand what each function does. All places where you have to make changes are marked with "# TODO:" comments. Note, you'll need djangotoolbox which provides the base classes for nonrel backends.

Let's start with base.py. You can use the DatabaseCreation class to define a custom data_types mapping from Django's fields to your database types. The types will later be passed to functions which you'll have to implement to convert values from and to the DB (convert_value_from_db() and convert_value_to_db()). If the default values work for you just leave the class untouched.

Also, if you want to maintain a DB connection we'd recommend storing it in DatabaseWrapper:

class DatabaseWrapper(NonrelDatabaseWrapper):
    def __init__(self, *args, **kwds):
        super(DatabaseWrapper, self).__init__(*args, **kwds)
        ...
        self.db_connection = connect(
            self.settings_dict['HOST'], self.settings_dict['PORT'],
            self.settings_dict['USER'], self.settings_dict['PASSWORD'])

The real meat is in compiler.py. Here, you have to define a BackendQuery class which handles query creation and execution. In the constructor you should create a low-level query instance for your connection. Depending on your DB API this might be nothing more than a dict, but let's say your DB provides a LowLevelQuery class:

class BackendQuery(NonrelQuery):
    def __init__(self, compiler, fields):
        super(BackendQuery, self).__init__(compiler, fields)
        self.db_query = LowLevelQuery(self.connection.db_connection)

Note, self.connection is the DatabaseWrapper instance which is the high-level DB connection object in Django.

Then, you need to define a function that converts Django's filters from Django's internal query object (SQLQuery, accessible via self.query) to their counterparts for your DB. This should be done in the add_filters() function. Since quite a few nonrel DBs seem to only support AND queries we provide a default implementation which makes sure that there is no OR filter (well, it has some logic for converting certain OR filters to AND filters). It expects an add_filter() function (without the trailing "s"):

@safe_call
def add_filter(self, column, lookup_type, negated, db_type, value):
    # Emulated/converted lookups
    if column == self.query.get_meta().pk.column:
        column = '_id'

    if negated:
        try:
            op = NEGATION_MAP[lookup_type]
        except KeyError:
            raise DatabaseError("Lookup type %r can't be negated" % lookup_type)
    else:
        try:
            op = OPERATORS_MAP[lookup_type]
        except KeyError:
            raise DatabaseError("Lookup type %r isn't supported" % lookup_type)

    # Handle special-case lookup types
    if callable(op):
        op, value = op(lookup_type, value)

    db_value = self.convert_value_for_db(db_type, value)
    self.db_query.filter(column, op, db_value)

This is just an example implementation. You don't have to use the same code. At first, we convert the primary key column to the DB's internal reserved column for the primary key. Then, we check if the filter should be negated or not and retrieve the respective DB comparison operator from a mapping like this:

OPERATORS_MAP = {
    'exact': '=',
    'gt': '>',
    'gte': '>=',
    # ...
    'isnull': lambda lookup_type, value: ('=' if value else '!=', None),
}

NEGATION_MAP = {
    'exact': '!=',
    'gt': '<=',
    # ...
    'isnull': lambda lookup_type, value: ('!=' if value else '=', None),
}

In our example implementation the operator can be a string or a callable that returns the comparison operator and a modified value. Finally, in the last two lines of add_filter() we convert the value to its low-level DB type and then add a filter to the low-level query object.

You might have noticed the @save_call decorator. This is important. It catches database exceptions and converts them to Django's DatabaseError. That decorator should be used for all your public API methods. Just modify the sample implementation in compiler.py to match your DB's needs.

Next, you have to define a fetch() function for retrieving the results from the configured query:

@safe_call
def fetch(self, low_mark, high_mark):
    if high_mark is None:
        # Infinite fetching
        results = self.db_query.fetch_infinite(offset=low_mark)
    elif high_mark > low_mark:
        # Range fetching
        results = self.db_query.fetch_range(high_mark - low_mark, low_mark)
    else:
        results = ()

    for entity in results:
        entity[self.query.get_meta().pk.column] = entity['_id']
        del entity['_id']
        yield entity

Here, low_mark and high_mark define the query range. If high_mark is not defined you should allow for iterating through the whole result set. At the end, we convert the internal primary key column, again, and return a dict representing the entity. If your DB also supports only fetching specific columns you should get the requested fields from self.fields (field.column contains the column name).

All values in the resulting dict are automatically converted via SQLCompiler.convert_value_from_db(). You have to implement that function (the backend template contains a sample implementation). That function gets a db_type parameter which is the type string as defined in your field type mapping in DatabaseCreation.data_types.

We won't look at the whole API in this post. There are additional functions for ordering, counting, and deleting the query results. It's pretty simple. The API might later get extended with support for aggregates, but currently you'll have to handle them at a lower level in your SQLCompiler implementation if your DB supports those features.

Another important function is called on Model.save():

class SQLInsertCompiler(NonrelInsertCompiler, SQLCompiler):
    @safe_call
    def insert(self, data, return_id=False):
        pk_column = self.query.get_meta().pk.column
        if pk_column in data:
            data['_id'] = data[pk_column]
            del data[pk_column]

        pk = save_entity(self.connection.db_connection,
            self.query.get_meta().db_table, data)
        return pk

Again, data is a dict because that maps naturally to nonrel DBs. Note, before insert() is called, all values are automatically converted via SQLCompiler.convert_value_for_db() (which you have to implement, too), so you don't have to deal with value conversions in that function.

I hope this gives you enough information to get started with a new backend. Please spread the word, so we can find backend contributors for all non-relational DBs. Django 1.3 development is getting closer and in order to get officially integrated into Django we have to prove that it's possible to use Django-nonrel with a wide variety of NoSQL DBs.

Please comment on the API. Should we improve anything? Is it flexible and easy enough?