distributed monitoring, passive check question
Paul L. Allen
pla at softflare.com
Wed Jul 27 03:58:06 CEST 2005
Ran Li writes:
> I m trying to configure distributed monitoring using nsca/send_nsca for
> passive checking. Now I can see PASV icon besides the http service that
> I want to monitor on host02,
If this is a service that is available only inside a particular subnet
that your nagios host is not part of then some sort of remote check (be it
NSCA, NRPE, check_by_ssh or other) is required. If it's a public service,
or one that is available only inside a particular subnet BUT your nagios
host is part of the same subnet then a passive check is feasible but a
direct check is better. The closer your test is to what you really wish
the monitored host to do the better.
You can ping the monitored host, but I've seen servers that are pingable
but can't do anything else. You can check the monitored host to see if
the httpd (or equivalent) process is running, but resource problems could
mean that httpd is running but nobody can get a response from the web
server. You can check that you can get a response from the web server
with check_tcp (it accepts the TCP connection) but perhaps it can go no
further. You can check that you get a 200 response from the web server,
but maybe the site it is serving relies on a database being accessible
to generate content (such a site is very cache-unfriendly and most PHP
web apps I have seen have this failing).
In what detail you check the web server is operational is a trade-off
between how mission-critical it is and the loading on your nagios host.
But if it's a public web server then passive checks are a really bad
idea even if your nagios host is on the same subnet. Hint: if your
web server only binds to the loopback address and your passive checks
are submitted by that web server then everything will look wonderful even
though your web server is unusable by anybody not on the same server.
Really, really, really, if it's a network service accessible from the
nagios host then use a direct check. Checking the number of processes
on a server is something you can do with a passive check (but also by using
NRPE, check_by_ssh, etc). Checking if crond is running is something you
can do with a passive check (or alternatives). Checking CPU load is
something you can do with a passive check (or alternatives). But checking
if a web server is running is not something you should do with a passive
check (or alternatives) UNLESS your nagios host cannot access that web
server. And even then you should aim to run your check from a different
host on the same subnet.
OK, minor qualification time. I have several customer sites which have
their own nagios host which submits passive check results for many things.
But when it comes to publicly-available services on those customer sites
our master nagios host does the checks directly and submits the results
back to the remote nagios hosts. OK, that's a simplification. Actually
the remote sites do check things like web servers directly but there is
also ANOTHER passive check on the same services which our master nagios
host submits to the remote nagios hosts. Actually, it's a little more
complex than that, but the basic principles are the same.
Your nagios checks should be as close as is practicable to testing the
real thing bearing in mind the resources that may be needed to do a full
check. Sometimes you're stuck with having to monitor a windows process
that does not make a network service available and all you can do is check
that windows thinks it is running. But if it's a network service visible
from the nagios host then do a direct check. If you do anything less
you'll end up telling customers "Our monitoring system told us that
Windows[turdmark] thought it was running a web server but actually it was
as screwed up as only Windows[turdmark] can be. Everything looked fine
to our monitoring system but actually Windows[turdmark] was lying."
--
Paul Allen
Softflare Support
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list