Graham King

Solvitas perambulum

My experience with django-mptt

Summary
In my recent projects using django-mptt, I found it to be a double-edged sword: it simplifies adding tree structures to models by automating much of the process but introduces significant complexity due to its 'magic'. It's recommended to use `MPTTModel` subclassing and avoid `mptt.register` due to maintenance concerns. For threaded comments, prefer a simple self-referential foreign key. Use raw SQL for bulk imports to avoid performance issues caused by auto tree-rebalancing during saves, and then use `MyModel.tree.rebuild()` to manually trigger rebalancing. Despite its potential, it's best to start without django-mptt and adopt it only if necessary, while hoping for further improvements to reduce its complexity and better document its features.

In the past few months, I’ve inherited two projects which used django-mptt, a toolkit for adding trees to Django models. Here’s my experience so far:

mptt is full of magic

That’s both good and bad. Good because it does a lot for you. Bad because it’s difficult to find out what that is. By becoming an MPTTModel you magically get four new database fields, tens of methods, and a whole new manager, grafted on to your model.

The project is making an effort to reduce the magic, for example by switching from signals to method overrides, which simplifies things significantly.

You should always subclass MPTTModel, and never, never use the mptt.register(MyModel) approach. The docs recommend against it, and the core developers tried to remove it. To answer the example given for it’s necessity, if you need Django’s built-in Group to be hierarchical, create your own group which extends MPTTModel, and has a foreign key to the built-in Group.

Maybe you just wanted a foreign key to self

If you just want something like threaded comments, add this to your model:

parent = models.ForeignKey('self')

The intent will be immediately clear to those that come after you.

You should always start with the parent foreign key. If performance is a big problem later on, and you’re already de-normalized the important fields, then by all means add in mptt. It’s an optimization, and one that will cost you in maintainability, so be sure that cost is worth paying.

Do bulk imports in raw sql

Every time you save an MPTTModel, it will re-balance it’s tree. That makes saves slow and database intensive. If you’re doing bulk imports, you’ll need to switch to raw SQL, and then re-balance the tree afterwards.

If you try and create MPTTModel objects from many processes concurrently, you’ll get deadlocks in the tree re-balancing code. As above, do your heavy lifting in raw SQL.

Use MyModel.tree.rebuild() to rebuild your tree

There’s an undocumented method which will balance your tree. Use this after raw SQL inserts for example: TreeManager.rebuild.

Note that there’s probably a very good reason it’s undocumented. You’re on you’re own.

Conclusion so far

My experience with it so far tells me there’s a small class of problems where django-mptt would be fantastic. There’s also projects where it adds more complexity that it removes. Please start your project without it, and only add it when it’s really needed.

There three bigs thing I would love to see django-mptt do are:

  1. Keep removing the magic, aggressively.

  2. Provide a way to temporarily switch off tree balancing to speed up inserts.

  3. Document TreeManager.rebuild, or document why it’s dangerous.

I realize I am being a bad open-source citizen here by providing suggestions as text instead of pull requests, and I apologize. Like everyone else, I am short on time.