Working With Sorted Sets in Redis
Use Cases for Redis Sorted Sets
I’m a web developer and the sorted sets feature helps so much with those “most viewed”, “most popular”, “most commented” lists that are so ubiquitous in sidebars and so on on “social” sites. The problem is that the query to get all the items, work out how many views or comments they all had, and rank them can be rather expensive especially if it’s on every page. Using sorted sets allows you to easily increment the count of views or comments, and these are then stored in an already-sorted format so that it’s very cheap and quick to get that data when we need it. Redis can be configured to be more or less persistent, and is usually used in the blazing-fast-but-may-lose-data style. Does it matter if a few views don’t get counted? In most cases, it doesn’t which is one reason why this is such an elegant solution.
Store A Set of Data
All of Redis’ commands have a prefix to indicate which data type is in use and for sorted sets, this is a z
. When we want to add a count to an item, we use zincrby
:
zincrby [set_name] [amount] [key]
So for an example where on a shopping site the most-viewed products should be listed, I would record a view by using this command on the page that displays the product:
zincrby product_views 1 hat
If the product_views
set doesn’t exist, Redis will silently create it. Similarly, if the key doesn’t already exist, that will be created with a score of zero and then incremented appropriately. Add this operation into your application at the point where the thing you want to count happens, bearing in mind that it may make more sense to use a key that’s a primary key for the item! You’ll want to look up the product details to display this list (to go faster: cache product details in redis too).
Fetching the Most-Viewed Items
Now we’ve quickly incremented values each time we viewed or commented on our items, now we can swiftly retrieve the ones that have got the most votes. The key here is that Redis stores the data already sorted, so there is no overhead of getting things in the right order when you request the data and this makes for speedy performance.
The command we want is zrevrange
: the z
is because it’s a sorted set, the rev
is because we want the results backwards, with the highest-scoring item first, and the range
is to get some or all of the elements from the set.
To get the three highest-scoring elements:
zrevrange product_views 0 2
The two extra arguments are start and stop – so this command returns the items in index 0, 1 and 2: three in total. This command also takes an optional final argument WITHSCORES
which will return two elements for each item: the key and also the score.
When The Data Isn’t There
When the system has just started up, or if you periodically empty your most-viewed lists, then sometimes there won’t be enough items in this set (or there could be none at all!). Essentially this is a “cache miss” – we went to grab something and it wasn’t there. To implement a solution like this one, it’s good to have a fallback option for when the data isn’t currently available. For example you might just run a query to pick three random products from the database, or use three featured products instead. Bear in mind this possible outcome and remember always to test your systems with an empty Redis once in a while!
Are you solving this pattern a different way, or have a tip to share that I didn’t mention? I’d love to hear from you in the comments!