KYTCR Part III: Automating Problem Resolution
You didn’t think I’d leave you hanging forever, right? I knew everyone was waiting with bated breath so here comes part 3. You have backups. You know how to fix those pesky simple problems that plague us all. Now it’s on to bigger and better things. Next stop…the monit ho train. The whole point of this post is to give you some pointers on how to automate recovering from bad things. It’s nice to know how to fix things manually but that won’t get you anywhere. The best pets are those that mostly take care of themselves. Below are some tips for turning your servers into good pets.
If you are on a slicehost server, you can use apt-get to install monit (sudo apt-get install monit) and if you are on a Railsmachine slice, you can hookup the dag repository and then use yum to install it (sudo yum install monit).
Either route you take, you will then have to edit the config file (look for /etc/monit/monitrc or /etc/monit.conf). The settings in the config file are pretty well documented. I pretty much just use monit to monitor my mongrel instances. The annoying thing is that you have to completely duplicate the entire config block for each instance of mongrel you want to monitor. I put a simple example that I use on pastie for those interested.
Once you have everything configured, you can start it up using the monit command (monit -c /etc/monit.conf or whatever the path is to your monit conf file).
Now comes the fun part. I’m assuming you have a few instances of mongrel running and you’ve installed and configured monit. If you have those basics covered, you can play with monit to confirm that it is actually working.
# get a list of your mongrels and the command that started them ps aux | grep mongrel # kill one of them by pid sudo kill 3453 # or whatever the pid number is
Now just wait and you should get an email and see the process you just killed fire back up on its own (ps aux | grep mongrel). Pretty cool, eh?
Honestly, there isn’t a whole lot more to monit. It has an optional web interface. I threw a few screenshots in below that you can check out.
Monit Start Page
Monit Individual Process Page
The new kid on the monitoring block is God. It was created by Tom Preston-Werner and is a pure ruby solution. The sweetness of this is that you can use ruby in your config file which makes for less repitition (think looping through an array of mongrel ports). The only downside is I’m sure it uses more memory (though it’s probably so little that who cares?). Anyway, I can’t vouch for it as I haven’t used it but I talked to Tom at RubyConf and he said he uses it in production on several servers with great success. I’ll be trying it soon.
Whether you go with old faithful, Monit, or the new kid, God, you need to use something to monitor your tag cloud. If you are like me, the only way you want to be up at 3AM is if you are hacking on something fun, not bringing mongrel instances back online.
- Monit Makes Mongrel Play Nice
- Daemonizing Ruby Scripts For Monit
- Mongrel Cluster and Monit
- Good Starter Monit Config File
The ‘Keep Your Tag Cloud Running’ Series
- Part 1: Backups
- Part 2: Short Term Problem Analysis and Resolution
- Part 3: Automating Problem Resolution
- Part 4: Resource Reporting
- Part 5: Log rotation and analysis
- Part 6: Slow Queries and Indexing
- Part 7: General Closing Thoughts