Choosing a message queue for Python on Ubuntu on a VPS
Updated Sep 13, 2011 to include redis, remove stompserver, and update beanstalkd
More and more, my web apps need to run things in the background: Sending email, re-calculating values, fetching website thumbnails, etc. In short, I need a message queue in my toolbox.
Luckily for me, message queues are plentiful, so there’s some excellent options. I looked at RabbitMQ, Gearman, Beanstalkd, and Redis.
I’d like the message queue to play nice with Python, with Ubuntu, and take almost no memory, as I’m on a
Virtual Private Server, and I’d like it to stay up forever. I want small and solid.
All of the python packages listed in the table are in PyPI. They can be installed with
pip install <package>.
Memory size is the resident set size, obtained like so:
ps -Ao pid,rsz,args | grep <name>. If there is a better way of estimating memory please let me know in the comments.
An all-singing all-dancing “complete and highly reliable Enterprise Messaging system”. With language like that you’d expect horrible bloat and per-cpu licensing, but happily that’s not the case. It’s straightforward to setup and relatively lean.
The protocol, AMQP, comes from the financial world, and is intended to replace Tibco’s RendezVous, the backbone of most investment banks. There’s lots of documentation, lots of users, a healthy ecosystem, and it looks good on your CV. I tried RabbitMQ first, and liked it so much I almost stopped my evaluation right there and deployed it.
The best tutorial for using it from Python is here: Rabbits and Warrens
import sys import time from amqplib import client_0_8 as amqp conn = amqp.Connection( host="localhost:5672", userid="guest", password="guest", virtual_host="/", insist=False) chan = conn.channel() i = 0 while 1: msg = amqp.Message('Message %d' % i) msg.properties["delivery_mode"] = 2 chan.basic_publish(msg, exchange="sorting_room", routing_key="testkey") i += 1 time.sleep(1) chan.close() conn.close()
from amqplib import client_0_8 as amqp conn = amqp.Connection( host="localhost:5672", userid="guest", password="guest", virtual_host="/", insist=False) chan = conn.channel() chan.queue_declare( queue="po_box", durable=True, exclusive=False, auto_delete=False) chan.exchange_declare( exchange="sorting_room", type="direct", durable=True, auto_delete=False,) chan.queue_bind( queue="po_box", exchange="sorting_room", routing_key="testkey") def recv_callback(msg): print msg.body chan.basic_consume( queue='po_box', no_ack=True, callback=recv_callback, consumer_tag="testtag") while True: chan.wait() #chan.basic_cancel("testtag") #chan.close() #conn.close()
Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages.
Developed by Danga Interactive (essentially Brad Fitzpatrick, who brought us Memcached and Perlbal). Used by LiveJournal, Digg and Yahoo.
Ubuntu users: Make sure you install package
gearman-job-server, which is the newer leaner C version of Gearman. Don’t install gearman-server, that is the old Perl version. Also install package
gearman-tools to get the command line tool.
import sys import time from gearman import GearmanClient, Task client = GearmanClient(["127.0.0.1"]) i = 0 while 1: client.dispatch_background_task('speak', i) print 'Dispatched %d' % i i += 1 time.sleep(1)
import time from gearman import GearmanWorker def speak(job): r = 'Hello %s' % job.arg print(r) return r worker = GearmanWorker("[127.0.0.1]") worker.register_function('speak', speak, timeout=3) worker.work()
Beanstalkd is a fast, distributed, in-memory workqueue service. Its interface is generic, but was designed for use in reducing the latency of page views in high-volume web applications by running most time-consuming tasks asynchronously.
Developed for a very popular Facebook Application. The smallest memory footprint: after startup, connecting, sending a few messages, it’s resident memory size (rsz) was only 0.7 Mb!
The Python library depends on PyYAML, so you need:
pip install pyyaml beanstalkc
Andreas Bolka has a beanstalk tutorial here
import time import beanstalkc beanstalk = beanstalkc.Connection(host='localhost', port=11300) i = 0 while True: beanstalk.put('Message %d' % i) i += 1 time.sleep(1)
import beanstalkc beanstalk = beanstalkc.Connection(host='localhost', port=11300) while True: job = beanstalk.reserve() print(job.body) job.delete()
Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets. It also works effectively as either a message bus or a message queue.
Simon Willison has an excellent Redis tutorial, which covers all the other things it can do.
import redis import time r = redis.Redis() i = 0 while True: r.rpush('queue', 'Message %d' % i) i += 1 time.sleep(1)
import redis r = redis.Redis() while True: val = r.blpop('queue') print(val)
Results and Conclusions
I’d be happy working with any of these. All of them were easy to setup, fast, decent in memory consumption, and had good Python libraries.
RabbitMQ is popular, but it took the most memory and is the most complex to use. It looks like a great product, but it’s Message Oriented Middleware, not an in-memory job queue, so it’s not what I’m looking for.
Beanstalkd is great. It can do persistent queues, and it’s the only true work server here, in that you tell it when a job completes. For the others you’d need to implement your own protocol. And beanstalkd takes almost no memory.
Gearman was designed for exactly the problem I have, takes little memory (1.4Mb), has a great pedigree (Danga), is widely deployed (LiveJournal, Digg, Yahoo), and someone helped me out on the #gearman IRC channel straight away. It even has queue persistence and clustering.
Until Sep 2011, I was happily using Gearman. Unfortunately it seems to not have much mindshare lately, and Redis is emerging as the new winner. Luckily, Redis is fantastic too.
Redis has by far the most mindshare Google Trends. It does all sorts of other things as well as being a message queue. Most likely it will be in your stack anyway. Queues persist automatically. It take a very small amount of memory. It take exactly two lines of code to send or receive a message. Redis is the new winner.