How often does Nagios need restarting? (Quis custodiet ipsos custodes?)
Marc Powell
marc at ena.com
Mon Jun 29 23:44:35 CEST 2009
On Jun 29, 2009, at 4:20 PM, Kustner, Tom wrote:
> 2. Thanks for pointing out that host checks are not always performed
> unless a service has been detected has failing. I value the service
> checking, but I assumed it was also pinging the host on a regular
> basis
> and that is apparently not the case. I come from the background of
> using products such as Insight Manager and OpenManage which are
> vendor-specific solutions that have their limitations but which
> automatically perform pinging on a regular basis. I'll look at the
> documentation for information on getting that set for us. It explains
> my frustration as to why a server can reboot and Nagios not detect it.
Word of warning - you *do not* want to enable regularly scheduled host
checks under nagios-2.x. The current logic of only checking a host
when a service is not OK is more than sufficient under normal
circumstances. Enabling regularly scheduled checks under 2.x will only
hurt your performance. While service checks can be done in parallel,
host checks are done serially in that version. While a host is being
checked, nagios stops *all other activity* until the host check
completes; other checks, logging, notifications, everything.
To illustrate, if you have 200 hosts, sending 5 pings (~5 seconds to
complete), it will take 200(hosts) x 5(seconds) = 1000 seconds just to
check your host status. That's over 16 minutes that nagios is only
checking those hosts and none of the services on those hosts, or
sending notifications, or anything else.
Nagios-3.x implements parallel host checks, just like service checks,
but even then regularly scheduled host checks aren't really needed or
encouraged and are just a waste of resources that could be used for
service checks, IMHO.
Even then, unless you're checking _very_ frequently, a modern server
can easily reboot in the time between checks. I'd recommend using
check_snmp as a service check to look at the snmp reported uptime and
alert if it's less than a reasonable interval of your normal check
interval (say 5-10 minutes typically).
--
Marc
------------------------------------------------------------------------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list