freshness check on passive service fails

Bill Corcoran wcorcor at siue.edu
Wed Jun 2 00:10:45 CEST 2004


I am also having a similar problem.  When active checks are disabled, 
but check_freshness and freshness_threshold are enabled/defined, nagios 
will never execute the check_command (a warning to notify me that 
passive checks are not being received) even when it is supposed to (the 
passive checks are "stale").  So, I tried renabling active checking, 
which resulted in the situation you are describing: the both passive and 
active checks are done, resulting in the service flapping between the 
warning and the real passive check.  This of course was no surprise to 
me, but I was willing to give it a try to solve my problem.

I have made sure to disable retention features before restarting nagios 
to apply new directives in service definitions, but this effort was to 
no avail.

Any ideas?

-Bill Corcoran

Antoine Reid wrote:

> --On Friday, May 28, 2004 9:46 AM +0200 jan gregor 
> <pamela at rak.bb.euroweb.sk> wrote:
> 
>>> For what it's worth, I'm having similar issues myself too. My setup is a
>>> bit different so I'll post it below.  What happens here is that I have
>>> two Nagios processes running on two different hosts, in different
>>> subnets. The  one
>>> doing the actual checks is obsessing over services and sends the results
>>> through nsca to the main nagios host.  The main host seems to decide my
>>> services results aren't fresh enough, then runs the check_command, which
>>> is a dummy script returning WARNING (originally CRITICAL but it
>>> generated too many notifications..), then, a couple seconds or minutes
>>> later, a new  passive
>>> check comes in, which brings the service(s) back to OK, then a couple
>>> minutes
>>> later, it switches back to WARNING and so on..
>>
>>
>> Why are you doing freshness checking on master host? Is that of any use?
>> Please, correct me, if i'm wrong, but freshness checking is mainly for
>> active checking. Only idea when this is usable with passive is in
>> passive+active checks, when one services are configured to accept
>> passive check and doing active checks over some time (to check if we
>> have not missed somthing). Again, maybe I overlooked something important,
>> please correct me, if I'm terribly wrong.
> 
> 
> Actually, the idea is that when active_checks are disabled, the 
> check_command is never run as long as the passive checks come in 
> frequently enough.  According to the docs (the part about distributed 
> monitoring and/or freshness checking), IF the results are not fresh 
> enough, then the check_command will be executed.  In a 
> failover/redundancy situation, that would be ideal as you main machine 
> does not usually perform the tests but will if the results are getting 
> stale.
> 
> In my situation though, the main machine *cannot* access the services 
> that the second host is monitoring.  What is configured instead, is a 
> check_command that will always return an error (right now, I return 
> WARNING but I would like it to be "CRITICAL") stating that the results 
> are stale. This would indicate that the nagios process on the 2nd 
> machine is no longer sending passive checks OR that the checks somehow 
> don't make it through to the main machine.  In any case, I would get a 
> notification and would start investigating.
> 
> This is exactly what I am trying to achieve.  Now, my problem is the 
> following:  the second nagios process is doing active checks, the 
> service(s) checked never or rarely go down (eg: fping on an otherwise 
> working machine).  I can see on the MAIN host that the passive checks 
> are being received AND processed by nagios yet it decides for some 
> reason that the results are not fresh and run the check_command defined 
> (which returns WARNING).
> 
> Net result is, according to the second machine, my services are up 100% 
> of the time.  According to the MAIN machine, those services go OK - 
> WARNING - OK - WARNING - OK - WARNING every couple of minutes..
> 
> Would anyone know which timeout or setting to tweak so that it HAS to 
> wait for much much longer without having received the passive checks 
> before it actually decides to take matter in its own hands and run the 
> check_command defined?  (Please see my previous post to see my 
> configuration details, services definitions, etc).
> 
>> Best regards
>>
>> Jan Gregor
> 
> 
> 
> thank you!
> Antoine
> 
> -- 
> Antoine Reid
> Administrateur Système - System Administrator
> 
>          __________________________________________________
> 
> Logient Inc.
> Solutions de logiciels Internet - Internet Software Solutions
> 417 St-Pierre, Suite #700
> Montréal (Qc) Canada H2Y 2M4
> T. 514-282-4118 ext.32
> F. 514-288-0033
> www.logient.com
> 
> *AVIS DE CONFIDENTIALITÉ*
> L'information apparaissant dans ce message est légalement privilégiée et
> confidentielle. Elle est destinée à l'usage exclusif de son destinataire
> tel qu'identifié ci-dessus. Si ce document vous est parvenu par erreur,
> soyez par la présente avisé que sa lecture, sa reproduction ou sa
> distribution sont strictement interdites. Vous êtes en conséquence prié de
> nous aviser immédiatement par téléphone au (514) 282-4118 ou par courriel.
> Veuillez de plus détruire le message. Merci.
> 
> *CONFIDENTIALITY NOTE*
> This message along with any enclosed documents are confidential and are
> legally privileged. They are intended only for the person(s) or
> organization(s) named above and any other use or disclosure is strictly
> forbidden. If this message is received by anyone else, please notify us at
> once by telephone (514) 282-4118 or e-mail and destroy this message. Thank
> you.
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: Oracle 10g
> Get certified on the hottest thing ever to hit the market... Oracle 10g. 
> Take an Oracle 10g class now, and we'll give you the exam FREE.
> http://ads.osdn.com/?ad_id149&alloc_id66&opÌk
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when 
> reporting any issue. ::: Messages without supporting info will risk 
> being sent to /dev/null
> 



-------------------------------------------------------
This SF.Net email is sponsored by the new InstallShield X.
>From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list