Strange service checks behavior
Andreas Ericsson
ae at op5.se
Wed Dec 1 19:56:24 CET 2004
Sandro Vaz - UOL wrote:
> Folks:
>
> I've read the f... manual, "State Types" section, but I can't understand
> why there is no hard recovery after a hard problem, generating wrong
> availability reports. Let me show you what's in my log files...
>
> Example 1) After a hard problem (6:38:00) we have a weird soft problem
> (6:42:32) and then a soft recovery (06:52:18). I can't find the
> following hard recovery in the logs. Is this correct?
>
Not by a longshot, but a little more info is needed before anyone can
correctly answer your question (at least without guessing).
What version of nagios are you using? How did you compile it?
Are you sure that this is the way the logs are or did you cut and paste
from the GUI?
Did you replace the hostname and service description with other values
before posting? If so, how did you do that?
Do you regularly run ntpdate from cron? If so, how do you sync the
server you run ntpdate against?
Are you sure you don't have several instances of Nagios running (it's
supposed to fork, so don't get spooked if there are several processes)?
> November 30, 2004 06:00
> [30-11-2004 06:52:18] SERVICE ALERT:
> Client-A-Host-2;Service-X;OK;SOFT;5;OK - 10 enviados, 10 recebidos, 0%
> pacotes perdidos
> [30-11-2004 06:51:32] SERVICE ALERT:
> Client-A-Host-2;Service-X;WARNING;SOFT;4;CRITIAL - 10 enviados, 7
> recebidos, 30% pacotes perdidos
> [30-11-2004 06:51:10] SERVICE ALERT:
> Client-A-Host-2;Service-X;CRITICAL;SOFT;3;CRITICAL - 10 enviados, 0
> recebidos, 100% pacotes perdidos
> [30-11-2004 06:50:04] SERVICE ALERT:
> Client-A-Host-2;Service-X;CRITICAL;SOFT;2;CRITICAL - 10 enviados, 0
> recebidos, 100% pacotes perdidos
> [30-11-2004 06:49:04] SERVICE ALERT:
> Client-A-Host-2;Service-X;CRITICAL;SOFT;1;CRITICAL - 10 enviados, 0
> recebidos, 100% pacotes perdidos
> [30-11-2004 06:43:04] SERVICE ALERT:
> Client-A-Host-2;Service-X;OK;SOFT;2;OK - 10 enviados, 10 recebidos, 0%
> pacotes perdidos
> [30-11-2004 06:42:32] SERVICE ALERT:
> Client-A-Host-2;Service-X;CRITICAL;SOFT;1;CRITICAL - 10 enviados, 4
> recebidos, 60% pacotes perdidos
> [30-11-2004 06:38:00] SERVICE ALERT:
> Client-A-Host-2;Service-X;CRITICAL;HARD;1;(Service Check Timed Out)
>
> Example 2) From time 8:19:24 thru 8:24:02, we have a hard problem and a
> hard recovery, which is correct. After that we had a hard problem
> (8:41:54) and then a bizarre critical soft (8:53:26), which I can't
> explain. 8:57:32 we have a Soft Recovery. Again I can't find the hard
> recovery in the log files...
>
> November 30, 2004 08:00
> [30-11-2004 08:57:32] SERVICE ALERT:
> Client-A-Host-2;Service-X;OK;SOFT;6;OK - 10 enviados, 10 recebidos, 0%
> pacotes perdidos
> [30-11-2004 08:57:24] SERVICE ALERT:
> Client-A-Host-2;Service-X;CRITICAL;SOFT;5;CRITICAL - 10 enviados, 0
> recebidos, 100% pacotes perdidos
> [30-11-2004 08:56:22] SERVICE ALERT:
> Client-A-Host-2;Service-X;CRITICAL;SOFT;4;CRITICAL - 10 enviados, 0
> recebidos, 100% pacotes perdidos
> [30-11-2004 08:55:22] SERVICE ALERT:
> Client-A-Host-2;Service-X;CRITICAL;SOFT;3;CRITICAL - 10 enviados, 0
> recebidos, 100% pacotes perdidos
> [30-11-2004 08:54:22] SERVICE ALERT:
> Client-A-Host-2;Service-X;CRITICAL;SOFT;2;CRITICAL - 10 enviados, 0
> recebidos, 100% pacotes perdidos
> [30-11-2004 08:53:26] SERVICE ALERT:
> Client-A-Host-2;Service-X;CRITICAL;SOFT;1;(Service Check Timed Out)
> [30-11-2004 08:41:54] SERVICE ALERT:
> Client-A-Host-2;Service-X;CRITICAL;HARD;1;(Service Check Timed Out)
> [30-11-2004 08:24:02] SERVICE ALERT:
> Client-A-Host-2;Service-X;OK;HARD;1;OK - 10 enviados, 9 recebidos, 10%
> pacotes perdidos
> [30-11-2004 08:19:42] SERVICE ALERT:
> Client-A-Host-2;Service-X;CRITICAL;HARD;1;(Service Check Timed Out)
>
> Analyzing these 2 situations, we have a wrong critical period (8:41:54
> through 13:57:43, where we finally have a hard recovery). Some good soul
> could explain this behavior, because without correct logs, Nagios will
> generate unreliable availability reports, because Nagios uses only hard
> states to produce them.
>
> TIA,
>
> SMV
>
>
>
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Lead Developer
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list