Recovery not being fired off under certain circumstances
srunschke at abit.de
srunschke at abit.de
Mon Nov 28 14:45:11 CET 2005
Hi,
lately I stumbled over a few discrepancies in our network monitoring, that
is
we were getting Warnings, but never received a Recovery, even though
it was pretty obvious that the service recovered.
I finally was able to pin down the reason for it.
Sadly I am unsure if it has to be seen as "working as intended" or if it
is
unexpected behaviour really. Personally I'd call it "broken as intended".
Excerpt from the config that reproduces the problem:
define service {
host_name RMS
use generic-SNMP
service_description RZ_TEMPERATUR
servicegroups SMS-SERVICEGROUP
register 1
check_command
check_snmp!abit-management!1.3.6.1.4.1.2769.10.4.1.1.3.1!1!30!35
notification_interval 10
stalking_options c,w,u
notification_options c,w,u,r
}
define serviceescalation {
host_name RMS
service_description RZ_TEMPERATUR
first_notification 1
last_notification 0
contact_groups HOST-CONTACTGROUP-SMS
escalation_period 24x7
escalation_options c,r,u
}
As this is the temperature check of our monitoring system for our main
datacenter,
I do want it to mail me a warning state - but I do not care that much
about warnings that
I want a SMS yet, the contact-groups of RZ_TEMPERATUR are mail-only
groups.
I escalate c,r,u into another contactgroup which has the relevant contacts
with their
pagers in it. Now if the service throws a Warning, we get the mail. But if
it recovers,
we neither get mail nor SMS.
Reason for that is, that the recovery is falling into the territory of the
escalation which then
checks who received the notification for this recovery in first place -
and this check yields no
information for the escalation - therefor not firing off a recovery at
all.
Even IF the check for that info would be tweaked, it would still fire the
recovery via
SMS, which is not my intended behaviour.
How do you guys see this particular problem?
Should Nagios be able to act more differenciated (sp?) on these kind of
problems
or is it my burden to find a hacky-hack solution for this? ;)
I'm up for some insights to this matter.
regards
sash
--------------------------------------------------
Sascha Runschke
Netzwerk Administration
IT-Services
ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch
Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:SRunschke at abit.de
http://www.abit.net
http://www.abit-epos.net
---------------------------------
Sicherheitshinweis zur E-Mail Kommunikation /
Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
More information about the Developers
mailing list