Strange service checks behavior

Andreas Ericsson ae at op5.se
Wed Dec 1 19:56:24 CET 2004


Sandro Vaz - UOL wrote:
> Folks:
> 
> I've read the f... manual, "State Types" section, but I can't understand 
> why there is no hard recovery after a hard problem, generating wrong 
> availability reports. Let me show you what's in my log files...
> 
> Example 1) After a hard problem (6:38:00) we have a weird soft problem 
> (6:42:32) and then a soft recovery (06:52:18). I can't find the 
> following hard recovery in the logs. Is this correct?
> 

Not by a longshot, but a little more info is needed before anyone can 
correctly answer your question (at least without guessing).

What version of nagios are you using? How did you compile it?

Are you sure that this is the way the logs are or did you cut and paste 
from the GUI?

Did you replace the hostname and service description with other values 
before posting? If so, how did you do that?

Do you regularly run ntpdate from cron? If so, how do you sync the 
server you run ntpdate against?

Are you sure you don't have several instances of Nagios running (it's 
supposed to fork, so don't get spooked if there are several processes)?

>    November 30, 2004 06:00  
> [30-11-2004 06:52:18] SERVICE ALERT: 
> Client-A-Host-2;Service-X;OK;SOFT;5;OK - 10 enviados, 10 recebidos, 0% 
> pacotes perdidos
> [30-11-2004 06:51:32] SERVICE ALERT: 
> Client-A-Host-2;Service-X;WARNING;SOFT;4;CRITIAL - 10 enviados, 7 
> recebidos, 30% pacotes perdidos
> [30-11-2004 06:51:10] SERVICE ALERT: 
> Client-A-Host-2;Service-X;CRITICAL;SOFT;3;CRITICAL - 10 enviados, 0 
> recebidos, 100% pacotes perdidos
> [30-11-2004 06:50:04] SERVICE ALERT: 
> Client-A-Host-2;Service-X;CRITICAL;SOFT;2;CRITICAL - 10 enviados, 0 
> recebidos, 100% pacotes perdidos
> [30-11-2004 06:49:04] SERVICE ALERT: 
> Client-A-Host-2;Service-X;CRITICAL;SOFT;1;CRITICAL - 10 enviados, 0 
> recebidos, 100% pacotes perdidos
> [30-11-2004 06:43:04] SERVICE ALERT: 
> Client-A-Host-2;Service-X;OK;SOFT;2;OK - 10 enviados, 10 recebidos, 0% 
> pacotes perdidos
> [30-11-2004 06:42:32] SERVICE ALERT: 
> Client-A-Host-2;Service-X;CRITICAL;SOFT;1;CRITICAL - 10 enviados, 4 
> recebidos, 60% pacotes perdidos
> [30-11-2004 06:38:00] SERVICE ALERT: 
> Client-A-Host-2;Service-X;CRITICAL;HARD;1;(Service Check Timed Out)
> 
> Example 2) From time 8:19:24 thru 8:24:02, we have a hard problem and a 
> hard recovery, which is correct. After that we had a hard problem 
> (8:41:54) and then a bizarre critical soft (8:53:26), which I can't 
> explain. 8:57:32 we have a Soft Recovery. Again I can't find the hard 
> recovery in the log files...
> 
>    November 30, 2004 08:00  
> [30-11-2004 08:57:32] SERVICE ALERT: 
> Client-A-Host-2;Service-X;OK;SOFT;6;OK - 10 enviados, 10 recebidos, 0% 
> pacotes perdidos
> [30-11-2004 08:57:24] SERVICE ALERT: 
> Client-A-Host-2;Service-X;CRITICAL;SOFT;5;CRITICAL - 10 enviados, 0 
> recebidos, 100% pacotes perdidos
> [30-11-2004 08:56:22] SERVICE ALERT: 
> Client-A-Host-2;Service-X;CRITICAL;SOFT;4;CRITICAL - 10 enviados, 0 
> recebidos, 100% pacotes perdidos
> [30-11-2004 08:55:22] SERVICE ALERT: 
> Client-A-Host-2;Service-X;CRITICAL;SOFT;3;CRITICAL - 10 enviados, 0 
> recebidos, 100% pacotes perdidos
> [30-11-2004 08:54:22] SERVICE ALERT: 
> Client-A-Host-2;Service-X;CRITICAL;SOFT;2;CRITICAL - 10 enviados, 0 
> recebidos, 100% pacotes perdidos
> [30-11-2004 08:53:26] SERVICE ALERT: 
> Client-A-Host-2;Service-X;CRITICAL;SOFT;1;(Service Check Timed Out)
> [30-11-2004 08:41:54] SERVICE ALERT: 
> Client-A-Host-2;Service-X;CRITICAL;HARD;1;(Service Check Timed Out)
> [30-11-2004 08:24:02] SERVICE ALERT: 
> Client-A-Host-2;Service-X;OK;HARD;1;OK - 10 enviados, 9 recebidos, 10% 
> pacotes perdidos
> [30-11-2004 08:19:42] SERVICE ALERT: 
> Client-A-Host-2;Service-X;CRITICAL;HARD;1;(Service Check Timed Out)
> 
> Analyzing these 2 situations, we have a wrong critical period (8:41:54 
> through 13:57:43, where we finally have a hard recovery). Some good soul 
> could explain this behavior, because without correct logs, Nagios will 
> generate unreliable availability reports, because Nagios uses only hard 
> states to produce them.
> 
> TIA,
> 
> SMV
> 
> 
> 

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list