passive host checks status instability, bug or configuration error?
Artur D'Assumpção
artur.dassumpcao at di.com.pt
Sat Apr 9 20:42:17 CEST 2005
I am still having the same problem in the same conditions, but I think
I've found a workaround that can help you debug this issue:
I've changed the service-is-stale with another plugin that returns 1 of
2 states possible depending on the $HOSTSTATE$ macro:
sr-0 plugins-di # cat service_is_stale
if [ $1 == "DOWN" ]; then
/usr/nagios/libexec/check_dummy 2 "Service results not received"
elif [ $1 == "UP" ]; then
/usr/nagios/libexec/check_dummy 3 "Service results not received"
fi
--
# default command used when nsca results for a given host's service
wheren't received
define command {
command_name service-is-stale
command_line $USER2$/service_is_stale $HOSTSTATE$
}
Now, when he service checks get staled the status returned depends the
$HOSTSTATE$ macro. In this specific case, why the host status is DOWN
the returned value for the staled services is CRITICAL, leading to a not
change of the host status.
Anyway, I still have the same question, isn't supposed to ignore all the
service checks if the host is stated DOWN?
AD
Artur D'Assumpção wrote:
> Hi ppl,
>
> I'm having very strange instable results in passive host checks, I
> don't know if i've found a bug or if I am actually doing something
> wrong here.
>
> I'll try to introduce the network first, before exposing the actual
> problem:
>
> Well, I have some hosts that are behind firewalled networks, so
> service and host checks have to be submited passively using send_nsca.
>
> In the main config I have these refresh options:
>
> check_service_freshness=1
> check_host_freshness=1
>
> service_freshness_check_interval=300
> host_freshness_check_interval=60
>
> retain status options are disabled also.
>
> A generic host in these conditions uses this template configuration,
>
> define host {
> name generic-passive-unreachable-host
>
> active_checks_enabled 0
> passive_checks_enabled 1
>
> obsess_over_host 1
> event_handler_enabled 0
> flap_detection_enabled 0
> process_perf_data 0
> retain_status_information 0
> retain_nonstatus_information 0
>
> check_command host-is-stale
> check_freshness 1
> freshness_threshold 120
> max_check_attempts 1
>
> notifications_enabled 1
> notification_interval 60
> notification_period 24x7
> notification_options d,u,r
>
> contact_groups dummy-contacts
>
> register 0
> }
>
> analogous for services:
>
> define service {
> name generic-passive-service
>
> active_checks_enabled 0
> passive_checks_enabled 1
>
> obsess_over_service 1
> event_handler_enabled 0
> flap_detection_enabled 0
> process_perf_data 1
> retain_status_information 0
> retain_nonstatus_information 0
> is_volatile 0
>
> check_command service-is-stale
> check_freshness 1
> freshness_threshold 300
> parallelize_check 1
> check_period 24x7
> max_check_attempts 2
> normal_check_interval 5
> retry_check_interval 5
>
> notifications_enabled 1
> notification_interval 60
> notification_period 24x7
> notification_options c,r
>
> contact_groups dummy-contacts
>
> register 0
> }
>
>
> Now to the real problem. I'm having problems with the host status
> flapping from UP to DOWN constantly. In my tests I have only the
> monitoring server up, the other clients/servers are down. Everytime
> the host threshold expires the 'host-is-stale' get run, returning
> allways a DOWN state:
>
> Apr 9 17:52:09 sr-0 nagios: Warning: The results of host
> 'domain.pt_sfci-dr-0' are stale by 60 seconds (threshold=120
> seconds). I'm forcing an immediate check of the host.
>
> This is the expected behavior, so far so good...
>
> The problem starts happening when I see that this host related passive
> services threshold is also expiring, even when the host is in status
> DOWN:
>
> Apr 9 17:52:17 sr-0 nagios: Warning: The results of service '[SYS]
> Swap Usage' on host 'domian.pt_sfci-dr-0' are stale by 40 seconds
> (threshold=500 seconds). I'm forcing an immediate check of the service.
>
> Well, when this happens the command 'service-is-stale' get executed
> placing the service in an UNKNOWN status and consequently the host
> status changes to UP.
>
> Now, let me shoot my question, aren't supposed the services checks for
> a stated DOWN host be ignored? This is causing the UP/DOWN flapping
> instability, I remember that there aren't any other distributed
> servers ou clients submiting results, NSCA isn't even running at this
> time. Any clues?
>
> I'm running nagios version 2.0b2. (I know there is 2.0b3 but since I
> havent found any changlog references on this subject, I am aiming for
> a configuration problem)
>
> Thanks very much,
>
> AD
>
>
>
>
>
>
>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue. ::: Messages without supporting info will risk
> being sent to /dev/null
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list