nrpe timeouts, dependencies, no connectivity
Andreas Ericsson
ae at op5.se
Wed Oct 4 14:57:49 CEST 2006
David Miller wrote:
> Hi All;
>
> I'm pulling out what little is left over my hair on this one:(
>
> I've got a setup where a nagios host at data center A is monitoring
> services on 30+ hosts at data center B. The bulk of the monitoring is
> via nrpe.
>
> Once every blue moon or so I lose connectivity between the two data
> centers. I then get 30 hosts * avg_number_services_monitored pages
> about problems, and a similar number of recovery messages.
>
> I'm on debian stable (sarge), running their package (1.3-cvs.200504). I
> setup a simple service to test connectivity between data centers:
>
> define service{
> use generic-service ; Name
> of service template to use
> hostgroup_name check-inap
> service_description Check INAP Connection
> is_volatile 0
> check_period 24x7
> max_check_attempts 5
> normal_check_interval 5
> retry_check_interval 2
> contact_groups dmiller
> notification_interval 120
> notification_period 24x7
> notification_options w,u,c,r
> check_command check-inap
> }
>
> Added it to hostgroups:
>
> define hostgroup {
> hostgroup_name check-inap
> alias Check INAP Connection
> contact_groups dmiller
> members css.int
> }
>
> Here is the actual check command:
>
> define command{
> command_name check-inap
> command_line /usr/lib/nagios/plugins/check_icmp 192.168.120.100
> }
>
> (yes, that's a valid IP address over our VPN)
>
> This is a typical dependency entry:
>
> define servicedependency{
> host_name css.int
> service_description Check INAP Connection
> dependent_host_name groupware.int
> dependent_service_description Check Disk Utilization
> execution_failure_criteria w,u,c ; These are the criteria
> for which check execution will be supressed
> notification_failure_criteria w,u,c ; These are the criteria
> for which notifications will be supressed
> }
>
>
> What's happening is that even if "Check INAP Connection" gets an
> NRPE_timeout, which should be a condition "unknown", the check of disk
> utilization for groupware.int is executed, as is the notification.
>
> What am I missing? Is this something fixed in more recent versions of
> nagios?
>
You're making it overly complicated. Since traffic is going to the
monitored hosts through your nrpe "proxy" at the data-center, you'd be
better off by setting up parent/child relations which takes this into
account and disabling "unreachable" notifications either globally or for
the hosts and services monitored through the proxy.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list