Tuesday, March 24, 2009

google app engine monitoring service

My plate is too full to start this project, but we need a simple monitoring service for our service, similar to nagios. I'm thinking about throwing this together on the new Google App Engine, because hosting a nagios server in our data center is pointless and it seems like a good fit.

Update: I ended up using nagios after all, and google app engine as a simple web proxy for round trip http testing.

2 comments:

andy said...

I just had exactly the same idea. Did you get anything running or any experiences that you would like to share?

ytjohn said...

I'm in a similar boat. Too much on my plate.

However, I'd like to make a few points: nagios, zabbix, zenoss, etc are nowhere near simple. They have years of development, tons of modules, and are very mature. It would make sense to host one in your datacenter.

However, I think you're concerned with making sure your servers are reachable from the Internet. For that, you need some basic ICMP/Port monitoring, for which Google App Engine should be a perfect match.

The only monitoring system I know of written in Python is Zenoss, but it uses MySQL and I haven't seen anyone working on making it work with App Engine's data store. It might still be worth looking at though.

However, it does look like such a project would need to be done from scratch.

I see three parts:

1) An interface to manage the hosts you want to monitor and the type of monitor (ICMP, Port connect, or URL to GET). This information could be stored in the Google data store.

2) A series of scripts that can iterate through the lists of monitored hosts, execute the monitor, and record the results in the data store.

3) Something to display the status of devices. This could be simple up/down or graphs showing up/down/response times.

4) Contacts and alert preferences management. This records who should be alerted for each host and how to alert. In most cases, this would be email addresses (whether pointed at emails or cell phone addresses). You could also build in instant messenger/twitter compatibility.

5) Scripts to handle sending the alerts. You would have to consider things like "do i want to be notified immediately, or do I want to be notified if it is down for $x minutes?