nrpe timeouts, dependencies, no connectivity

David Miller nagios at d.sparks.net
Wed Oct 4 14:46:03 CEST 2006


Hi All;

I'm pulling out what little is left over my hair on this one:(

I've got a setup where a nagios host at data center A is monitoring 
services on 30+ hosts at data center B.  The bulk of the monitoring is 
via nrpe.

Once every blue moon or so I lose connectivity between the two data 
centers.  I then get 30 hosts * avg_number_services_monitored pages 
about problems, and a similar number of recovery messages.

I'm on debian stable (sarge), running their package (1.3-cvs.200504).  I 
setup a simple service to test connectivity between data centers:

define service{
        use                             generic-service         ; Name 
of service template to use
        hostgroup_name                  check-inap
        service_description             Check INAP Connection
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              5
        normal_check_interval           5
        retry_check_interval            2
        contact_groups                  dmiller
        notification_interval           120
        notification_period             24x7
        notification_options            w,u,c,r
        check_command                   check-inap
        }

Added it to hostgroups:

define hostgroup {
        hostgroup_name  check-inap
        alias           Check INAP Connection
        contact_groups  dmiller
        members         css.int
        }

Here is the actual check command:

define command{
        command_name    check-inap
        command_line    /usr/lib/nagios/plugins/check_icmp 192.168.120.100
}
       
(yes, that's a valid IP address over our VPN)

This is a typical dependency entry:

define servicedependency{
        host_name                       css.int
        service_description             Check INAP Connection
        dependent_host_name             groupware.int
        dependent_service_description   Check Disk Utilization
        execution_failure_criteria      w,u,c   ; These are the criteria 
for which check execution will be supressed
        notification_failure_criteria   w,u,c   ; These are the criteria 
for which notifications will be supressed
        }


What's happening is that even if "Check INAP Connection" gets an 
NRPE_timeout, which should be a condition "unknown", the check of disk 
utilization for groupware.int is executed, as is the notification.

What am I missing?  Is this something fixed in more recent versions of 
nagios?

Thanks in advance,

--- David

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list