Setting up Monit on Ubuntu
Summary
Monit tells you if something goes wrong on your server, and tries to fix it. It can, for example, alert you:
- When a process dies.
- When a machine stops responding to network requests
- When your machine has too high load average, memory consumption, or CPU usage.
- When a file changes, hasn’t changed for a period of time, or grows beyond a certain size.
It can run a script of your choosing to attempt to fix the problem. It has an HTTP interface that shows you essential stats about the services you are monitoring. For detailed graphs, I recommend Munin.
Here’s how to get it working on Ubuntu:
Editing the config file
sudo apt-get install monit sudo vim /etc/default/monit
Edit the single line to startup=1
.
The config file that comes with monit is well commented, but just in case here’s the breakdown.
sudo vim /etc/monit/monitrc
Set Monit to check services every two minutes (120 seconds), and log to /var/log/daemon.log
set daemon 120 set logfile syslog facility log_daemon
Setup email alerts:
set mailserver localhost set mail-format { from: monit@myserver.domain.com } set alert sysadmin@domain.com
Switch on the HTTP interface, allow access from anywhere, and require a username and password. Make it a decent password, because the HTTP interface allows you to stop and start services.
set httpd port 2812 use address myserver.domain.com allow 0.0.0.0/0.0.0.0 allow myusername:mypassword
Monitor the machine itself:
check system myserver.domain.com if loadavg (1min) > 4 then alert if loadavg (5min) > 3 then alert if memory usage > 75% then alert if cpu usage (user) > 70% then alert if cpu usage (system) > 30% then alert if cpu usage (wait) > 20% then alert
Then monitor all the services running on that box.
If you monitor the totalcpu
resource, note that is a percentage of all CPUs. On a 4 CPU machine, 25% represents a process consuming 100% of one CPU.
For the Apache monitor, the PID file is defined in /etc/apache2/envvars
, and is usually /var/run/apache2.pid
.
Here are the service monitoring lines from my config:
check process apache with pidfile /var/run/apache2.pid start program = "/etc/init.d/apache2 start" with timeout 20 seconds stop program = "/etc/init.d/apache2 stop" if totalcpu > 20% for 2 cycles then alert if totalcpu > 20% for 5 cycles then restart check process nginx with pidfile /var/run/nginx.pid start program = "/etc/init.d/nginx start" stop program = "/etc/init.d/nginx stop" check process gearmand with pidfile /var/run/gearman/gearmand.pid start program = "/etc/init.d/gearman-job-server start" stop program = "/etc/init.d/gearman-job-server stop" check process memcached with pidfile /var/run/memcached.pid start program = "/etc/init.d/memcached start" stop program = "/etc/init.d/memcached stop" check process mysqld with pidfile /var/run/mysqld/mysqld.pid start program = "/etc/init.d/mysql start" stop program = "/etc/init.d/mysql stop"
Start monit, and query it:
sudo /etc/init.d/monit start sudo monit status
You need the HTTP interface to use the ‘status’ command.
Monitoring MySQL replication
To monit MySQL replication, create a script to touch a file if replication is still running. Put that script in cron. Get monit to check that file.
The idea comes from replication monitoring with monit, where they use Ruby.
I ported the script to Python, as a Django management command.
The crontab:
# m h dom mon dow command * * * * * /usr/local/myproject/mysql-watchdog-cron.sh
The shell script:
#!/bin/bash cd /usr/local/myproject /usr/bin/python manage.py mysql_replication_monit >> /dev/null 2>&1
The Django command:
import os import logging from django.core.management.base import NoArgsCommand from django.db import connection WATCH = '/usr/local/myproject/mysql_monit_watchdog' def mysql_fetch_one_dict(cursor): "Like DB-API's fetch_one but returns a dict instead of a tuple" data = cursor.fetchone() if not data: return None desc = cursor.description dict = {} for (name, value) in zip(desc, data): dict[name[0]] = value return dict class Command(NoArgsCommand): 'Touch a file if MySQL replication is running' help = 'Touch a file if MySQL replication is running. Call from cron. Monit checks that file' def handle_noargs(self, **options): 'Called by NoArgsCommand' cursor = connection.cursor() cursor.execute('SHOW SLAVE STATUS') row = mysql_fetch_one_dict(cursor) if row['Slave_IO_Running'] == 'Yes' and row['Slave_SQL_Running'] == 'Yes': with file(WATCH, 'a'): os.utime(WATCH, None) else: logging.error('*ERROR*: Slave IO not running')
Add these lines to /etc/monit/monitrc:
check file mysql_replication with path /usr/local/myproject/mysql_monit_watchdog if timestamp > 3 minutes then alert
Happy monitoring!