(Service Check Timed Out) returns critical
Scott
lists.scott at themagicbox.net
Mon Nov 18 14:38:03 CET 2002
On your hosts.. always set a parent.. this way when a host becomes
unreachable it will walk to parent tree and see where the network has
actually failed.. This is basically a dependancy of hosts and makes for a
lot less pages/emails when something closer to nagios fails.
Example:
efine host{
host_name some.host
alias some.host.alias
address some.hosts.ip.address
check_command check-host-alive
max_check_attempts 10
notification_interval 40
notification_period 24x7
notification_options d,r
parents some.switch.on.my.network
}
This means that on check-host-alive of some.hosts.ip.address failing, it
checks some.switch.on.my.network to ensure it is actually the host that
has failed and in case the switch has failed. Then it only pages for that
and sets a blocking outage on the web page.. pretty nifty :)
Scott
Michael Markstaller said:
> Hi,
>
> I'm using nagios to check approx 100 hosts and 350 services working fine
> so far.
> I'm asking myself if it's possible to tell nagios to report "unknown"
> instead of critical if a service check times out ? I tried to set the
> "service_check_timeout" in nagios.cfg to 30 to have nagios kill
> non-responsive service-checks quicker in case of a high load due to many
> unreachable hosts (see below) but this resulted in getting dozens of
> cirtical-alerts due to (Service Check Timed Out) with check_snmp.
> Because I'd prefer to get "unknown" in case of any plugin-timeout error
> not resulting in a retrieved value. Or maybe this problem is located
> within check_snmp ?
>
> The hosts are mostly routers and quite distributed, so I have made
> dependencies for all hosts to get a notification only on the host
> failing but this doesn't work so well like I think it should. If for
> instance the first router on which all others are depending fails,
> nagios messes quite up with a few hundred processes for pending checks
> and gives me many false alerts instead of the causing the problem.
> Anybody with some general giudeline to help getting useful alerts when
> something "core" fails (like the switch the nagios-server is attached to
> or DNS etc.)
>
> Thanks,
>
> Michael Markstaller
>
> Elaborated Networks GmbH
> www.elabnet.de
> Lise-Meitner-Str. 1, D-85662 Hohenbrunn, Germany
> fon: +49-8102-8951-60, fax: +49-8102-8951-80
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: To learn the basics of securing
> your web site with SSL, click here to get a FREE TRIAL of a Thawte
> Server Certificate: http://www.gothawte.com/rd524.html
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
>
-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing
your web site with SSL, click here to get a FREE TRIAL of a Thawte
Server Certificate: http://www.gothawte.com/rd524.html
More information about the Users
mailing list