Recovery not being fired off under certain circumstances (repost)
srunschke at abit.de
srunschke at abit.de
Wed Nov 30 09:44:21 CET 2005
(this is a repost from nagios-devel as noone answered)
Hi,
lately I stumbled over a few discrepancies in our network monitoring,
that is we were getting Warnings, but never received a Recovery,
even though it was pretty obvious that the service recovered.
I finally was able to pin down the reason for it.
Sadly I am unsure if it has to be seen as "working as intended" or
if it is unexpected behaviour really. Personally I'd call it
"broken as intended".
Excerpt from the config that reproduces the problem:
define service {
host_name RMS
use generic-SNMP
service_description RZ_TEMPERATUR
servicegroups SMS-SERVICEGROUP
register 1
check_command
check_snmp!abit-management!1.3.6.1.4.1.2769.10.4.1.1.3.1!1!30!35
notification_interval 10
stalking_options c,w,u
notification_options c,w,u,r
}
define serviceescalation {
host_name RMS
service_description RZ_TEMPERATUR
first_notification 1
last_notification 0
contact_groups HOST-CONTACTGROUP-SMS
escalation_period 24x7
escalation_options c,r,u
}
As this is the temperature check of our monitoring system for our main
datacenter, I do want it to mail me a warning state - but I do not care
that much about warnings that I want a SMS yet, the contact-groups of
RZ_TEMPERATUR are mail-only groups.
I escalate c,r,u into another contactgroup which has the relevant contacts
with their pagers in it. Now if the service throws a Warning, we get the
mail. But if it recovers, we neither get mail nor SMS.
Reason for that is, that the recovery is falling into the territory of the
escalation which then checks who received the notification for this
recovery
in first place - and this check yields no information for the escalation -
therefor not firing off a recovery at all.
Even IF the check for that info would be tweaked, it would still fire the
recovery via SMS, which is not my intended behaviour.
How do you guys see this particular problem?
Should Nagios be able to act more differenciated (sp?) on these kind of
problems or is it my burden to find a hacky-hack solution via nested
contacts/escalations for this? ;)
I'm up for some insights to this matter.
regards
sash
--------------------------------------------------
Sascha Runschke
Netzwerk Administration
IT-Services
ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch
Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:SRunschke at abit.de
http://www.abit.net
http://www.abit-epos.net
---------------------------------
Sicherheitshinweis zur E-Mail Kommunikation /
Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list