Graham King

Solvitas perambulum

Choosing a message queue for Python on Ubuntu on a VPS

Summary
As web applications increasingly require background processing for tasks like sending emails and fetching thumbnails, a reliable message queue becomes essential. I evaluated options such as RabbitMQ, Gearman, Beanstalkd, and Redis, prioritizing compatibility with Python, minimal memory usage, and simplicity in setup since I'm running on a Virtual Private Server. Each of these systems integrates well with Python and boasts good performance. In my experience, while RabbitMQ is powerful but complex and resource-intensive, Gearman excels at distributed tasks with low memory consumption. Beanstalkd offers great performance with persistent queues, but Redis emerged as the clear winner due to its widespread adoption, versatility as a data store, low memory footprint, and straightforward implementation for messaging.

Updated Sep 13, 2011 to include redis, remove stompserver, and update beanstalkd

More and more, my web apps need to run things in the background: Sending email, re-calculating values, fetching website thumbnails, etc. In short, I need a message queue in my toolbox.

Luckily for me, message queues are plentiful, so there’s some excellent options. I looked at RabbitMQ, Gearman, Beanstalkd, and Redis.

I’d like the message queue to play nice with Python, with Ubuntu, and take almost no memory, as I’m on a

Virtual Private Server, and I’d like it to stay up forever. I want small and solid.

All of the python packages listed in the table are in PyPI. They can be installed with pip install <package>.

Summary

RabbitMQ Gearman Beanstalkd Redis
Language Erlang C C C
Ubuntu package rabbitmq-server gearman-job-server beanstalkd redis-server
Python lib amqplib gearman beanstalkc redis
Memory 9Mb 1.4Mb 0.7Mb 1.3Mb
License MPL BSD GPL BSD

Memory size is the resident set size, obtained like so: ps -Ao pid,rsz,args | grep <name>. If there is a better way of estimating memory please let me know in the comments.

RabbitMQ

An all-singing all-dancing “complete and highly reliable Enterprise Messaging system”. With language like that you’d expect horrible bloat and per-cpu licensing, but happily that’s not the case. It’s straightforward to setup and relatively lean.

The protocol, AMQP, comes from the financial world, and is intended to replace Tibco’s RendezVous, the backbone of most investment banks. There’s lots of documentation, lots of users, a healthy ecosystem, and it looks good on your CV. I tried RabbitMQ first, and liked it so much I almost stopped my evaluation right there and deployed it.

The best tutorial for using it from Python is here: Rabbits and Warrens

Publisher

import sys
import time

from amqplib import client_0_8 as amqp

conn = amqp.Connection(
    host="localhost:5672",
    userid="guest",
    password="guest",
    virtual_host="/",
    insist=False)
chan = conn.channel()

i = 0
while 1:
    msg = amqp.Message('Message %d' % i)
    msg.properties["delivery_mode"] = 2

    chan.basic_publish(msg,
        exchange="sorting_room",
        routing_key="testkey")
    i += 1
    time.sleep(1)

chan.close()
conn.close()

Consumer

from amqplib import client_0_8 as amqp

conn = amqp.Connection(
    host="localhost:5672",
    userid="guest",
    password="guest",
    virtual_host="/",
    insist=False)
chan = conn.channel()

chan.queue_declare(
    queue="po_box",
    durable=True,
    exclusive=False,
    auto_delete=False)
chan.exchange_declare(
    exchange="sorting_room",
    type="direct",
    durable=True,
    auto_delete=False,)

chan.queue_bind(
    queue="po_box",
    exchange="sorting_room",
    routing_key="testkey")

def recv_callback(msg):
    print msg.body

chan.basic_consume(
    queue='po_box',
    no_ack=True,
    callback=recv_callback,
    consumer_tag="testtag")

while True:
    chan.wait()

#chan.basic_cancel("testtag")
#chan.close()
#conn.close()

Gearman

Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages.

Developed by Danga Interactive (essentially Brad Fitzpatrick, who brought us Memcached and Perlbal). Used by LiveJournal, Digg and Yahoo.

Ubuntu users: Make sure you install package gearman-job-server, which is the newer leaner C version of Gearman. Don’t install gearman-server, that is the old Perl version. Also install package gearman-tools to get the command line tool.

Client

import sys
import time

from gearman import GearmanClient, Task

client = GearmanClient(["127.0.0.1"])

i = 0
while 1:
    client.dispatch_background_task('speak', i)
    print 'Dispatched %d' % i
    i += 1
    time.sleep(1)

Worker

import time

from gearman import GearmanWorker

def speak(job):
    r = 'Hello %s' % job.arg
    print(r)
    return r

worker = GearmanWorker("[127.0.0.1]")
worker.register_function('speak', speak, timeout=3)
worker.work()

Beanstalkd

Beanstalkd is a fast, distributed, in-memory workqueue service. Its interface is generic, but was designed for use in reducing the latency of page views in high-volume web applications by running most time-consuming tasks asynchronously.

Developed for a very popular Facebook Application. The smallest memory footprint: after startup, connecting, sending a few messages, it’s resident memory size (rsz) was only 0.7 Mb!

The Python library depends on PyYAML, so you need:

pip install pyyaml beanstalkc

Andreas Bolka has a beanstalk tutorial here

Producer

import time
import beanstalkc

beanstalk = beanstalkc.Connection(host='localhost', port=11300)
i = 0
while True:
    beanstalk.put('Message %d' % i)
    i += 1
    time.sleep(1)

Consumer

import beanstalkc

beanstalk = beanstalkc.Connection(host='localhost', port=11300)
while True:
    job = beanstalk.reserve()
    print(job.body)
    job.delete()

Redis

Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets. It also works effectively as either a message bus or a message queue.

Simon Willison has an excellent Redis tutorial, which covers all the other things it can do.

Publisher

import redis
import time

r = redis.Redis()

i = 0
while True:
    r.rpush('queue', 'Message %d' % i)
    i += 1
    time.sleep(1)

Consumer

import redis

r = redis.Redis()
while True:
    val = r.blpop('queue')
    print(val)

Results and Conclusions

I’d be happy working with any of these. All of them were easy to setup, fast, decent in memory consumption, and had good Python libraries.

RabbitMQ is popular, but it took the most memory and is the most complex to use. It looks like a great product, but it’s Message Oriented Middleware, not an in-memory job queue, so it’s not what I’m looking for.

Beanstalkd is great. It can do persistent queues, and it’s the only true work server here, in that you tell it when a job completes. For the others you’d need to implement your own protocol. And beanstalkd takes almost no memory.

Gearman was designed for exactly the problem I have, takes little memory (1.4Mb), has a great pedigree (Danga), is widely deployed (LiveJournal, Digg, Yahoo), and someone helped me out on the #gearman IRC channel straight away. It even has queue persistence and clustering.

Until Sep 2011, I was happily using Gearman. Unfortunately it seems to not have much mindshare lately, and Redis is emerging as the new winner. Luckily, Redis is fantastic too.

Redis has by far the most mindshare Google Trends. It does all sorts of other things as well as being a message queue. Most likely it will be in your stack anyway. Queues persist automatically. It take a very small amount of memory. It take exactly two lines of code to send or receive a message. Redis is the new winner.