Sequence of Service versus Flapping checks/notifications
Ethan Galstad
nagios at nagios.org
Fri Oct 5 17:40:20 CEST 2007
Matthew Richardson wrote:
> I have spotted today a couple of cases where a service notification has
> been received immediately followed by a FLAPPINGSTART notification when
> using 3.0b4. These struck me as being not quite what one might logically
> expect.
>
> For example:-
>
> |[03-10-2007 18:01:24] SERVICE NOTIFICATION: smstest1;walkers_smtp-ky;smtp;FLAPPINGSTART (OK);notify-service-by-email;SMTP OK - 6.036 sec. response time
> |[03-10-2007 18:01:24] SERVICE FLAPPING ALERT: walkers_smtp-ky;smtp;STARTED; Service appears to have started flapping (23.0% change >= 20.0% threshold)
> |[03-10-2007 18:01:23] SERVICE NOTIFICATION: smstest1;walkers_smtp-ky;smtp;OK;notify-service-by-email;SMTP OK - 6.036 sec. response time
> |[03-10-2007 18:01:23] SERVICE ALERT: walkers_smtp-ky;smtp;OK;HARD;3;SMTP OK - 6.036 sec. response time
>
> |[03-10-2007 19:03:34] SERVICE NOTIFICATION: smstest1;jtc_rich-jtc01;ospf_jsy-qr-jtc01;FLAPPINGSTART (OK);notify-service-by-email;OSPF OK - Full adjacency
> |[03-10-2007 19:03:34] SERVICE FLAPPING ALERT: jtc_rich-jtc01;ospf_jsy-qr-jtc01;STARTED; Service appears to have started flapping (23.9% change >= 20.0% threshold)
> |[03-10-2007 19:03:33] SERVICE NOTIFICATION: smstest1;jtc_rich-jtc01;ospf_jsy-qr-jtc01;OK;notify-service-by-email;OSPF OK - Full adjacency
> |[03-10-2007 19:03:33] SERVICE ALERT: jtc_rich-jtc01;ospf_jsy-qr-jtc01;OK;HARD;3;OSPF OK - Full adjacency
>
>>From what I can see, this seems to occur only when a service moves from a
> HARD non-OK state into an OK state at the same time as the flapping
> threshold is reached. I have not noticed any when transition from an OK
> state to non-OK.
>
> It occurs to me that it might be preferable to turn the logic around such
> that the flapping checks and notifications are done prior to reporting any
> hard change of service state. If so, then only the FLAPPINGSTART
> notifications would be issued in each of the examples above.
>
> Best wishes,
> Matthew
>
This situation is certainly not optimal. It looks like this could also
occur when moving from OK to non-OK states. I'll post a patch to CVS
shortly.
Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
More information about the Developers
mailing list