Lots of hosts, only a couple of services?
Andreas Ericsson
ae at op5.se
Tue Aug 24 20:08:36 CEST 2004
Demetri Mouratis wrote:
> On Tue, 24 Aug 2004, Jason Byrns wrote:
>
>
>>Here's part of the problem: If any device misses a single service check,
>>a host check is immediately triggered. But sometimes a device can miss
>>a ping even though there is no problem, just a burst of network traffic.
>>
>>So here's my question: how can I improve our Nagios setup?
>>
>>Here are my goals:
>>1) Prevent false positives with max_check_attempts (set to 5)
>>2) Get Nagios to respect max_check_attempts
>>3) Have the Status Map correctly show situation if any devices are down.
>>
>>Could I...
>>1) Check telnet instead of just pinging these devices? (And change the
>>host checks back to the regular host_check_alive?)
>>2) Not check services at all, unless necessary, and only do host checks?
>> (Nagios throws lots of warnings if you do this, and I suppose I'd
>>rather avoid that)
Nagios 2.0 supports regularly scheduled host checks, which Nagios 1.x
doesn't. If you don't have any services for a host, then Nagios 1.x will
NEVER perform the host check.
>>3) ...? (Profit?)
>>
>
>
> Jason,
>
> Sounds like the source of your problem is that the service check and host
> check are both ultimately using PING. I think you could remedy your
> problem with your suggestion 1 above, check telnet instead of, or better
> yet in addition to, PING. A successfull check of the telnet service
> should cause Nagios to bypass the forced host check.
>
... but a failed check of the PING service will still trigger the host
check, which will fail since it's also PING based. Solution; Cut the
PING service entirely, or make ICMP a privileged protocol on the network
(NOT recommended, altough sometimes the right thing to do if you're sure
noone will kill the network with a sudden ping-storm).
> One other suggestion is to modify the paramaters of the check_ping and
> check_host_alive invocations to send and expect say 20 packets. You might
> also try increasing the warning and critical packet loss paramters.
> These changes would allow you to weather the storm of the network traffic
> burst without sending out premature false alarms.
>
Or roll a plugin of your own to do the host-check. I'm thinking the yet
non-existant check_conn_refused should do the trick if it measures the
time it takes to achieve a connection refused to any arbitrary port.
> Ohh, and make sure to bitch at your network guys for dropping those
> packets in the first place ;-)
>
> Hope that helps.
> ---------------------------------------------------------------------
> Demetri Mouratis
> dmourati at linfactory.com
>
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Lead Developer
-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list