Multiple Search Keys in CouchDB

I work quite a bit with CouchDB (Cloudant, a hosted CouchDB solution, is part of Bluemix, IBM's cloud platform - and I work for IBM so I get to use this as much as I like) and today I found a feature I hadn't seen before. I struggled to find the docs, so I thought I'd post my working example here in case anyone else is solving a similar problem: wanting to use more than one set of key ranges when filtering a CouchDB view.

The Data

I'm using an example database of movie data, which includes information such as the year the film was released, which genres it belongs to and the ratings on IMDb.

One of the questions I wanted to answer was: how many films released since 2012 have had a rating of 9 or above?

There are a bunch of different ways to get the data out of CouchDB: since I'm using Cloudant, I could use Cloudant Query to have it search the database (which would be fine, it's a small data set). I prefer to work with views since they (generally!) perform better.

In CouchDB, there isn't an equivalent of the WHERE clause that you see in a traditional RDBMS. Views are created with keys, which define the sort order and also allow us to start and stop our results at particular points.

Views and Multiple Keys

My view simply indexes the records by year and rating (this gets updated when any record changes, making it quick to access as the data is already available), and the "reduce" function counts how many films have this year/rating combination.

Here is the code for the view:

function (doc) {
  emit([doc.year,], null);

This view outputs something like this (just a little bit of the output!)

            key: [ 1971, 8.5 ],
            value: 3
            key: [ 1971, 8.6 ],
            value: 1
            key: [ 1971, 8.7 ],
            value: 2
            key: [ 1972, 7.6 ],
            value: 13
            key: [ 1972, 7.7 ],
            value: 6
            key: [ 1972, 7.8 ],
            value: 8

Hopefully this shows what I said about the keys dictating the sort order, we get all the records sorted by year, and then by rating within the year. To filter the results we get from this view, we amend the request we send.

Using simple GET requests we can do:

  • /_design/rating/_view/year-rating?group_level=2 makes the basic request to the view, outputs as shown above
  • /_design/rating/_view/year-rating?group_level=2&startkey=[2012,9] shows all films made in 2012 with a rating of 9 or more ... and then goes on to also return all films made later than 2012 also

A common pattern for solving this if you use the same parameters all the time (i.e. look for a record that isn't "deleted" is one I use a lot!), is to create a view that only contains those records, so that you don't need to filter them out when requesting the view. Here, we could create a view that only included films with a rating of 9 or more, and use the year as the key - that's one way to solve it.

Another alternative is to pass multiple key ranges into our couchdb view. This is a relatively new feature, but for a situation like this one, you may find it handy. To achieve this: make a POST request rather than a GET request, and pass a JSON body including a "queries" parameter, like this:

  "queries": [
    { "startkey": [ 2012, 9 ], "endkey": [ 2012, 10 ] },
    { "startkey": [ 2013, 9 ], "endkey": [ 2013, 10 ] },
    { "startkey": [ 2014, 9 ], "endkey": [ 2014, 10 ] },
    { "startkey": [ 2015, 9 ], "endkey": [ 2015, 10 ] }

This returns the films with a 9+ rating for each of the years. It took me some digging to find how to make this request and pass in the multiple ranges, so I thought I'd put it here so that I can find it again, if it helps you too then that is awesome!

Leave a Reply

Please use [code] and [/code] around any source code you wish to share.