freshness check on passive service fails
Bill Corcoran
wcorcor at siue.edu
Wed Jun 2 00:10:45 CEST 2004
I am also having a similar problem. When active checks are disabled,
but check_freshness and freshness_threshold are enabled/defined, nagios
will never execute the check_command (a warning to notify me that
passive checks are not being received) even when it is supposed to (the
passive checks are "stale"). So, I tried renabling active checking,
which resulted in the situation you are describing: the both passive and
active checks are done, resulting in the service flapping between the
warning and the real passive check. This of course was no surprise to
me, but I was willing to give it a try to solve my problem.
I have made sure to disable retention features before restarting nagios
to apply new directives in service definitions, but this effort was to
no avail.
Any ideas?
-Bill Corcoran
Antoine Reid wrote:
> --On Friday, May 28, 2004 9:46 AM +0200 jan gregor
> <pamela at rak.bb.euroweb.sk> wrote:
>
>>> For what it's worth, I'm having similar issues myself too. My setup is a
>>> bit different so I'll post it below. What happens here is that I have
>>> two Nagios processes running on two different hosts, in different
>>> subnets. The one
>>> doing the actual checks is obsessing over services and sends the results
>>> through nsca to the main nagios host. The main host seems to decide my
>>> services results aren't fresh enough, then runs the check_command, which
>>> is a dummy script returning WARNING (originally CRITICAL but it
>>> generated too many notifications..), then, a couple seconds or minutes
>>> later, a new passive
>>> check comes in, which brings the service(s) back to OK, then a couple
>>> minutes
>>> later, it switches back to WARNING and so on..
>>
>>
>> Why are you doing freshness checking on master host? Is that of any use?
>> Please, correct me, if i'm wrong, but freshness checking is mainly for
>> active checking. Only idea when this is usable with passive is in
>> passive+active checks, when one services are configured to accept
>> passive check and doing active checks over some time (to check if we
>> have not missed somthing). Again, maybe I overlooked something important,
>> please correct me, if I'm terribly wrong.
>
>
> Actually, the idea is that when active_checks are disabled, the
> check_command is never run as long as the passive checks come in
> frequently enough. According to the docs (the part about distributed
> monitoring and/or freshness checking), IF the results are not fresh
> enough, then the check_command will be executed. In a
> failover/redundancy situation, that would be ideal as you main machine
> does not usually perform the tests but will if the results are getting
> stale.
>
> In my situation though, the main machine *cannot* access the services
> that the second host is monitoring. What is configured instead, is a
> check_command that will always return an error (right now, I return
> WARNING but I would like it to be "CRITICAL") stating that the results
> are stale. This would indicate that the nagios process on the 2nd
> machine is no longer sending passive checks OR that the checks somehow
> don't make it through to the main machine. In any case, I would get a
> notification and would start investigating.
>
> This is exactly what I am trying to achieve. Now, my problem is the
> following: the second nagios process is doing active checks, the
> service(s) checked never or rarely go down (eg: fping on an otherwise
> working machine). I can see on the MAIN host that the passive checks
> are being received AND processed by nagios yet it decides for some
> reason that the results are not fresh and run the check_command defined
> (which returns WARNING).
>
> Net result is, according to the second machine, my services are up 100%
> of the time. According to the MAIN machine, those services go OK -
> WARNING - OK - WARNING - OK - WARNING every couple of minutes..
>
> Would anyone know which timeout or setting to tweak so that it HAS to
> wait for much much longer without having received the passive checks
> before it actually decides to take matter in its own hands and run the
> check_command defined? (Please see my previous post to see my
> configuration details, services definitions, etc).
>
>> Best regards
>>
>> Jan Gregor
>
>
>
> thank you!
> Antoine
>
> --
> Antoine Reid
> Administrateur Système - System Administrator
>
> __________________________________________________
>
> Logient Inc.
> Solutions de logiciels Internet - Internet Software Solutions
> 417 St-Pierre, Suite #700
> Montréal (Qc) Canada H2Y 2M4
> T. 514-282-4118 ext.32
> F. 514-288-0033
> www.logient.com
>
> *AVIS DE CONFIDENTIALITÉ*
> L'information apparaissant dans ce message est légalement privilégiée et
> confidentielle. Elle est destinée à l'usage exclusif de son destinataire
> tel qu'identifié ci-dessus. Si ce document vous est parvenu par erreur,
> soyez par la présente avisé que sa lecture, sa reproduction ou sa
> distribution sont strictement interdites. Vous êtes en conséquence prié de
> nous aviser immédiatement par téléphone au (514) 282-4118 ou par courriel.
> Veuillez de plus détruire le message. Merci.
>
> *CONFIDENTIALITY NOTE*
> This message along with any enclosed documents are confidential and are
> legally privileged. They are intended only for the person(s) or
> organization(s) named above and any other use or disclosure is strictly
> forbidden. If this message is received by anyone else, please notify us at
> once by telephone (514) 282-4118 or e-mail and destroy this message. Thank
> you.
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: Oracle 10g
> Get certified on the hottest thing ever to hit the market... Oracle 10g.
> Take an Oracle 10g class now, and we'll give you the exam FREE.
> http://ads.osdn.com/?ad_id149&alloc_id66&opÌk
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue. ::: Messages without supporting info will risk
> being sent to /dev/null
>
-------------------------------------------------------
This SF.Net email is sponsored by the new InstallShield X.
>From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list