Monitoring Websites with Pingdom
Posted by Steve Lounsbury on December 31st, 2008
I thought I would continue with the “Monitoring stuff” theme and talk about how we make sure our websites are running smoothly. Over the years, we’ve collected a pretty large number of sites. It’s pretty much impossible for us to check every one of them every day just to make sure they aren’t having any problems. Naturally, being developers, we look for ways to automate this process.
Our first stab at it was a home grown script. This was done in a few hours one afternoon and was setup on a cron job. It would ping a set of sites every five minutes, check the http response header and make sure we were getting a “200 ok”. This worked for the most part, but didn’t save any history and it would continue to pester us until the site was back up.
After our needs outgrew the homegrown script, we looked for something a little more sophisticated. There are several open source options. One being Nagios. It lets you monitor any server for standard services like HTTP, FTP, SSH, etc. Also, there are several plugins available that let you monitor load, database activity, etc. It produces pretty “management friendly” graphs and has a very flexible system for notifying you when something goes wrong. This is all great, but you have to set it up and maintain it yourself. I don’t have the time to do that.
Finally, I came across pingdom.com. Pingdom is a distributed monitoring service that does one job and does it well. You can monitor all the standard services, setup notifications over email and sms, see those “management friendly” graphs and even check for certain strings on the site you are monitoring. This makes it easy to write an endpoint that checks crucial services (db, filesystem space, etc) and produces an “OK” or “BAD” message. This way, pingdom can not only let you know if your site is up and responding, but if your database is up and running too. Overall, Pingdom is exactly what we need and best of all, I don’t need to do anything to maintain it.
I’ll leave you with an example of one of those pretty graphs:










January 4th, 2009 » 1:01 pm
I see 92% uptime in the example you’ve provided. That strikes me as very low. What do you consider to be the acceptable uptime threshold?
January 5th, 2009 » 9:22 am
Hi Joe, that particular website was undergoing maintenance during that period. I thought I would show something that looked a little more interesting than a graph that showed 100% across the board.