Flap Detection
Scott
scott at netspace.net.au
Fri Apr 2 03:17:59 CEST 2004
Hi Guys and girls and any other genders available.
I wouldn't say I am new to Nagios but I seem to have hit a bit of a
wall in regards to a slight problem with flapping.
I have it set to the default settings in the nagios.cfg file
<snippet>
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
</snippet>
This is working fine with the current setup except when I come across a
scenerio that looks similar to this.
[2004-04-02 10:21:54] SERVICE ALERT:
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE
daemon.
[2004-04-02 10:12:54] SERVICE ALERT:
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30
seconds.
[2004-04-02 10:06:54] SERVICE ALERT:
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE
daemon.
[2004-04-02 10:00:44] SERVICE ALERT:
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30
seconds.
[2004-04-02 08:41:35] SERVICE ALERT:
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE
daemon.
[2004-04-02 08:23:34] SERVICE ALERT:
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30
seconds.
[2004-04-02 06:32:14] SERVICE ALERT:
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE
daemon.
[2004-04-02 06:29:14] SERVICE ALERT:
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30
seconds.
[2004-04-02 06:26:08] SERVICE ALERT:
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE
daemon.
[2004-04-02 06:05:14] SERVICE ALERT:
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30
seconds.
[2004-04-02 05:13:24] SERVICE ALERT:
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE
daemon.
[2004-04-02 04:55:29] SERVICE ALERT:
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30
seconds.
[2004-04-02 04:25:15] SERVICE ALERT:
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE
daemon.
[2004-04-02 04:16:25] SERVICE ALERT:
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30
seconds.
[2004-04-02 03:31:14] SERVICE ALERT:
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE
daemon.
[2004-04-02 03:25:24] SERVICE ALERT:
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30
seconds.
[2004-04-02 03:22:24] SERVICE ALERT:
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE
daemon.
[2004-04-02 03:19:24] SERVICE ALERT:
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30
seconds.
As is seen from this log, I am getting a state change every check, this
continues for some time and therefore I get a notification for every
one of these checks until Nagios sees the threshold crossed for
flapping (in this case it would be the very next check or 2). The only
problem is that I would like to set up something to let me know that
the service/host has gone into flap detecion territory and therefore
suspend notifications after notifying the correct parties that it has
been put into a flap state.
Not sure if this makes sense or not but at present notifications go
silent and its not until I look at the web gui that I actually know
that it has gone into a flap state.
Was wondering if anybody else has found this problem and if there is a
simple solution for it.
I have read the docs and they do say that in flap states NOBODY GETS
NOTIFIED, which I won't argue with but would like to know that it has
occured though.
Looking forward to hearing some feedback
---
Scott
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 3671 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20040402/1ea95ea6/attachment.bin>
More information about the Users
mailing list