Service hard state generation and host hard or soft down status
Andreas Ericsson
ae at op5.se
Fri May 4 13:30:03 CEST 2012
On 05/04/2012 12:16 PM, Paul Ezvan wrote:
> Hi dear Nagios users,
>
> I have some interrogation about hard state generation.
>
> According to the documentation, one of the condition to create a hard
> non-ok state for a service is to get a check in a non-ok state while the
> associated host is down. But it is not stated if the host should be down
> HARD or not.
>
> The current behavior of Nagios is clearly ignoring if the host is in
> SOFT or HARD down state, for example :
>
> [1336039429] INITIAL HOST STATE: ces;UP;HARD;1;PING OK - Packet loss =
> 0%, RTA = 0.42 ms
> [1336039429] INITIAL SERVICE STATE:
> ces;SV-SE-Linux-Memoire;OK;HARD;1;OK: Memory Usage (W> 95): 12%Swap
> Usage (W> 95, C> 99): 0%
> [1336039429] INITIAL SERVICE STATE: ces;SV-SE-Linux-SWAP;OK;HARD;1;SWAP
> OK - 100% free (3999 MB out of 3999 MB)
> [1336039747] HOST ALERT: ces;DOWN;SOFT;1;CRITICAL - Host Unreachable
> (10.235.72.159)
> [1336039812] HOST ALERT: ces;DOWN;SOFT;2;CRITICAL - Host Unreachable
> (10.235.72.159)
> [1336039822] SERVICE ALERT:
> ces;SV-SE-Linux-SWAP;CRITICAL;HARD;1;Connection refused or timed out
> [1336039822] SERVICE ALERT:
> ces;SV-SE-Linux-Memoire;CRITICAL;HARD;1;Connection refused or timed out
> [1336039877] HOST ALERT: ces;DOWN;HARD;3;CRITICAL - Host Unreachable
> (10.235.72.159)
> [1336040122] SERVICE ALERT:
> ces;SV-SE-Linux-Memoire;CRITICAL;HARD;1;Connection refused or timed out
> [1336040122] SERVICE ALERT:
> ces;SV-SE-Linux-SWAP;CRITICAL;HARD;1;Connection refused or timed out
>
> The associated service immediately get an HARD non-ok state even if the
> host is in a SOFT down state.
>
> In the Nagios code I found in base/checks.c in non-ok state processing
> logic :
>
> /* if the host is down or unreachable ... */
> /* 05/29/2007 NOTE: The host might be in a SOFT problem state due to
> host check retries/caching. Not sure if we should take that into
> account and do something different or not... */
> if(route_result != HOST_UP) {
>
> I think we should take into account the SOFT or HARD host state to
> ensure consistency between host and service hard/soft state.
>
> Is my analysis correct ?
>
Yes.
> What is your point of view about the above proposition ?
>
That, from a practical perspective, Nagios is doing the Right Thing(tm)
already. Avoiding false positives is almost as important as catching
all true negatives, and adding soft-state logic to this would mean we
send one false positive for each failing host that happens to have a
service-check occur after the soft state but before the hard state.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list