Hierarchical host schedule queuing
Shawn Iverson
shawn at nccsc.k12.in.us
Fri Mar 11 14:57:29 CET 2005
On Thursday, March 10, 2005 10:38 PM, Marc Powell wrote,
>
On Thursday, March 10, 2005 6:39 PM Shawn Iverson wrote,
>>
>> Greetings!
>>
>> While simulating a network failure to test my nagios setup,
>I noticed
>> that nagios (using version 1.2) does not hierarchically proceed to
>check
>> upstream hosts following when it concludes that a host is down hard.
>>
<snip>
>
>It sounds to me that you're describing a feature that has been
>in Nagios, and Netsaint prior, for years. If your idea is
>somehow different, can you clarify?
Read below.
>
>http://nagios.sourceforge.net/docs/1_0/networkreachability.html
>
>"Monitoring Remote Hosts
>
>Checking the status of remote hosts is a bit more complicated
>that for local hosts. If Nagios cannot monitor services on a
>remote host, it needs to determine whether the remote host is
>down or whether it is unreachable. Luckily, the <parent_hosts>
>option allows Nagios to do this.
>
>If a host check command for a remote host returns a non-OK
>state, Nagios will "walk" the depency tree (as shown in the
>figure above) until it reaches the top (or until a parent host
>check results in an OK state). By doing this, Nagios is able
>to determine if a service problem is the result of a down
>host, an down network link, or just a plain old service failure.
It appears that Nagios is not "walking" properly on my setup then. My
notification options are as follows from contacts.cfg:
define contact {
contact_name Shawn_email
alias Shawn Iverson Email
host_notification_period 24x7
service_notification_period 24x7
service_notification_options u,w,c,r
host_notification_options d,r
host_notification_commands host-notify-by-email
service_notification_commands notify-by-email
email shawn at nccsc.k12.in.us
}
define contact {
contact_name Shawn_pager
alias Shawn Iverson Pager
host_notification_period 24x7
service_notification_period 24x7
service_notification_options u,w,c,r
host_notification_options d,r
host_notification_commands host-notify-by-email
service_notification_commands notify-by-email
email ##########@paging.acswireless.com
Nagios is not "walking" my dependency tree for some reason. Something
has gone astray for me.
Here is a sample of a parent tree of my hosts.cfg for reference (ip
addresses not included):
<hosts.cfg START>
define host {
name generic-host ; The name of this host
template - referenced in other host definitions, used for template
recursion/resolution
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information
across program restarts
retain_nonstatus_information 1 ; Retain non-status information
across program restarts
register 0 ; DONT REGISTER THIS DEFINITION
- ITS NOT A REAL HOST, JUST A TEMPLATE!
max_check_attempts 10
notification_interval 360
notification_period 24x7
notification_options d,r
check_command check-host-alive
}
define host {
use generic-host
host_name 6509
alias 6509 Core Router/Switch
address #.#.#.#
}
define host {
use generic-host
host_name vlan_switch
alias VLAN Radio Switch
address #.#.#.#
parents 6509
}
define host {
use generic-host
host_name eastomniA
alias East Omni A
address #.#.#.#
parents vlan_switch
}
define host {
use generic-host
host_name eastomniB
alias East Omni B
address #.#.#.#
parents vlan_switch
}
define host {
use generic-host
host_name rileyA
alias Riley Radio A
address #.#.#.#
parents eastomniA
}
define host {
use generic-host
host_name rileyB
alias Riley Radio B
address #.#.#.#
parents eastomniB
}
define host {
use generic-host
host_name rileyrouter
alias Riley Router
address #.#.#.#
parents rileyA, rileyB
}
define host {
use generic-host
host_name rileySW2
alias Riley Switch 2
address #.#.#.#
parents rileyrouter
}
define host {
use generic-host
host_name rileyserver
alias Riley Server
address #.#.#.#
parents rileySW2
}
<hosts.cfg END>
This is what is happening with my setup. Say that the vlan_switch goes
down and cannot route packets, and the scheduling queue is as follows at
that moment:
rileyserver PING 03-11-2005 08:32:35 03-11-2005 08:37:35 ENABLED
rileyrouter PING 03-11-2005 08:33:35 03-11-2005 08:38:35 ENABLED
rileyB PING 03-11-2005 08:34:35 03-11-2005 08:39:35 ENABLED
eastomniA PING 03-11-2005 08:35:35 03-11-2005 08:40:35 ENABLED
rileySW2 PING 03-11-2005 08:36:35 03-11-2005 08:41:35 ENABLED
6509 PING 03-11-2005 08:37:35 03-11-2005 08:42:35 ENABLED
vlan_switch PING 03-11-2005 08:38:35 03-11-2005 08:43:35 ENABLED
rileyA PING 03-11-2005 08:39:35 03-11-2005 08:44:35 ENABLED
eastomniB PING 03-11-2005 08:40:35 03-11-2005 08:45:35 ENABLED
Here is the order of events that is occuring on my box:
1) I receive an email alert that the rileyserver is in a DOWN state.
2) I receive an email alert that the rileyrouter is in a DOWN state.
3) I receive an email alert that the rileyB is in a DOWN state.
4) I receive an email alert that the eastomniA is in a DOWN state.
5) I receive no email alert for rileySW2 because it is an UNREACHABLE
state.
6) The 6509 is UP, no email
7) I finally receive an alert after the previous five that the
vlan_switch in a DOWN state.
8) I receive no email alert for rileyA because it is in an UNREACHABLE
state.
9) I receive no email alert for eastomniB because it is in an
UNREACHABLE state.
>DOWN vs. UNREACHABLE Notification Types
>
>I get lots of email from people asking why Nagios is sending
>notifications out about hosts that are unreachable. The answer
>is because you configured it to do that. If you want to
>disable UNREACHABLE notifications for hosts, modify the
>notification_options argument of your host definitions to not
>include the u (unreachable) option. More information can be
>found in this FAQ."
>
I am not receiving any alerts for hosts in an UNREACHABLE state, but I
am receiving false alerts for hosts that should be in an UNREACHABLE
state, not a DOWN state.
>--
>Marc
>
--
Shawn
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list