Graham King

Solvitas perambulum

Setting up Monit on Ubuntu

Summary
Monit is a versatile tool that monitors your server and attempts to fix issues by alerting you when crucial components, such as processes or network services, fail or exhibit problematic behavior like high resource usage. It uses configurable scripts to resolve these problems automatically. The setup on Ubuntu involves installing Monit, configuring it to check services every two minutes, setting up email alerts, and enabling an HTTP interface for live monitoring. Additionally, you can monitor system health metrics and specific services like Apache, Nginx, Gearman, Memcached, and MySQL by defining checks in the Monit configuration file. For MySQL replication monitoring, you create a Python script that updates a file if replication is healthy, schedule it with cron, and configure Monit to alert if the file's timestamp exceeds a set threshold. Finally, you start Monit and can query its status through the HTTP interface.

Monit tells you if something goes wrong on your server, and tries to fix it. It can, for example, alert you:

  • When a process dies.
  • When a machine stops responding to network requests
  • When your machine has too high load average, memory consumption, or CPU usage.
  • When a file changes, hasn’t changed for a period of time, or grows beyond a certain size.

It can run a script of your choosing to attempt to fix the problem. It has an HTTP interface that shows you essential stats about the services you are monitoring. For detailed graphs, I recommend Munin.

Here’s how to get it working on Ubuntu:

Editing the config file

sudo apt-get install monit
sudo vim /etc/default/monit

Edit the single line to startup=1.

The config file that comes with monit is well commented, but just in case here’s the breakdown.

sudo vim /etc/monit/monitrc

Set Monit to check services every two minutes (120 seconds), and log to /var/log/daemon.log

set daemon 120
set logfile syslog facility log_daemon

Setup email alerts:

set mailserver localhost
set mail-format { from: monit@myserver.domain.com }
set alert sysadmin@domain.com

Switch on the HTTP interface, allow access from anywhere, and require a username and password. Make it a decent password, because the HTTP interface allows you to stop and start services.

set httpd port 2812
    use address myserver.domain.com
    allow 0.0.0.0/0.0.0.0
    allow myusername:mypassword

Monitor the machine itself:

check system myserver.domain.com
    if loadavg (1min) > 4 then alert
    if loadavg (5min) > 3 then alert
    if memory usage > 75% then alert
    if cpu usage (user) > 70% then alert
    if cpu usage (system) > 30% then alert
    if cpu usage (wait) > 20% then alert

Then monitor all the services running on that box.

If you monitor the totalcpu resource, note that is a percentage of all CPUs. On a 4 CPU machine, 25% represents a process consuming 100% of one CPU.

For the Apache monitor, the PID file is defined in /etc/apache2/envvars, and is usually /var/run/apache2.pid.

Here are the service monitoring lines from my config:

check process apache with pidfile /var/run/apache2.pid
    start program = "/etc/init.d/apache2 start" with timeout 20 seconds
    stop program  = "/etc/init.d/apache2 stop"
    if totalcpu > 20% for 2 cycles then alert
    if totalcpu > 20% for 5 cycles then restart

check process nginx with pidfile /var/run/nginx.pid
    start program = "/etc/init.d/nginx start"
    stop program = "/etc/init.d/nginx stop"

check process gearmand with pidfile /var/run/gearman/gearmand.pid
    start program = "/etc/init.d/gearman-job-server start"
    stop program = "/etc/init.d/gearman-job-server stop"

check process memcached with pidfile /var/run/memcached.pid
    start program = "/etc/init.d/memcached start"
    stop program = "/etc/init.d/memcached stop"

check process mysqld with pidfile /var/run/mysqld/mysqld.pid
    start program = "/etc/init.d/mysql start"
    stop program = "/etc/init.d/mysql stop"

Start monit, and query it:

sudo /etc/init.d/monit start
sudo monit status

You need the HTTP interface to use the ‘status’ command.

Monitoring MySQL replication

To monit MySQL replication, create a script to touch a file if replication is still running. Put that script in cron. Get monit to check that file.

The idea comes from replication monitoring with monit, where they use Ruby.

I ported the script to Python, as a Django management command.

The crontab:

# m h  dom mon dow   command
* * * * * /usr/local/myproject/mysql-watchdog-cron.sh

The shell script:

#!/bin/bash
cd /usr/local/myproject
/usr/bin/python manage.py mysql_replication_monit >> /dev/null 2>&1

The Django command:

import os
import logging

from django.core.management.base import NoArgsCommand
from django.db import connection

WATCH = '/usr/local/myproject/mysql_monit_watchdog'

def mysql_fetch_one_dict(cursor):
    "Like DB-API's fetch_one but returns a dict instead of a tuple"
    data = cursor.fetchone()
    if not data:
        return None
    desc = cursor.description

    dict = {}

    for (name, value) in zip(desc, data):
        dict[name[0]] = value

    return dict


class Command(NoArgsCommand):
    'Touch a file if MySQL replication is running'

    help = 'Touch a file if MySQL replication is running. Call from cron. Monit checks that file'

    def handle_noargs(self, **options):
        'Called by NoArgsCommand'

        cursor = connection.cursor()
        cursor.execute('SHOW SLAVE STATUS')
        row = mysql_fetch_one_dict(cursor)
        if row['Slave_IO_Running'] == 'Yes' and row['Slave_SQL_Running'] == 'Yes':
            with file(WATCH, 'a'):
                os.utime(WATCH, None)
        else:
            logging.error('*ERROR*: Slave IO not running')

Add these lines to /etc/monit/monitrc:

check file mysql_replication with path /usr/local/myproject/mysql_monit_watchdog
    if timestamp > 3 minutes then alert

Happy monitoring!