More on notifications and reboot monitoring
Greg Vickers
g.vickers at qut.edu.au
Tue Jan 11 00:19:49 CET 2005
Carson Gaspar wrote:
>
> But there will _be_ no checks (other than ping) after the downtime if
> anything went wrong, because the host will still be down.
If a host is still down after downtime is over, you should set up a host
notification (providing you are using active checks) on state
DOWN/UNREACHABLE. Then when you receive that host down notification you
will know that none of the services are available on that host.
>> I would use active checks as Andreas suggested for checking host
>> availability. Passive-only checks might be troublesome to implement
>> reliably.
>
> I can't. They just don't scale to the number of hosts I need to monitor,
> in Nagios's current incarnations (including 2.0 beta).
We are monitoring 7k services and 1.5k hosts using Nagios 1.1. Sure it's
slow as a dog (>60 sec to bring up the cgi web pages, Dual Xeon 2GHz,
1Gb RAM, SCSI RAID) - we're about to upgrade to 2.0 and I'm expecting to
see a fairly massive decrease in response time for the web cgi pages.
Nagios has scaled successfully for us, the monitoring process has low
latency, most checks get performed within 10 sec of when they were
scheduled, and notifications go out lickety-split. The only slow part is
bringing up the web page and who wants to slog through 7k services in a
web page? (Yes we use active checks on all our hosts.)
> Ah well, I have a solution that works. I'm not thrilled with it, but it
> handles every corner case I can think of. And will scale to a very large
> number of hosts. Is anyone else here using Nagios to monitor >1000
> hosts? My target (right now) is 2k hosts per monitoring server, and a
> total of about 12k hosts monitored.
I take it you are in a data centre or some such business area. Demarcate
areas (by client or whatever) and set up distrubuted Nagios boxes to
monitor sub-areas.
--
Greg Vickers
Security Engineer
Network Services
Information Technology Services
Queensland University of Technology
email: g.vickers at qut.edu.au
phone: (07) 3864 9536
CIROS code: 00213J
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list