Phantom service checks
Rasmus Plewe
rplewe at ess.nec.de
Thu Dec 5 14:14:12 CET 2002
Hello,
the only thing I found about this issue was in the mailing list
archive from last week monday, but no response.
Once upon a time I had a service check, which was associated with a
couple of hosts and hostgroups. Now I don't have this service any
more, even the command definition in checkcommands.cfg is deleted.
When doing a recursive grep over the Nagios directory, the only files
where this service name appears are the log files. But every now and
again I get notifications telling me that this service is critical or
up (it being so unreliable was one of the reasons to eliminate it in
the first place). How can I get rid of this?
Another thing:
During a greater downtime yesterday night, I had the opportunity to
test the "scheduled downtime" functionality. What I think what
happened is the following:
- downtime started. Lots of mails were generated, about every host and
service that was configured.
- I scheduled downtime for the time being. Still notifications were
sent out (yes, I restarted Nagios).
- I removed certain email adresses (like "half of the company" - oops)
from getting notifications by setting the *_notification_periods in
contacts.cfg to "none". Restarted Nagios. Still notifications were
sent.
- I changed the email addresses in contacts so that they didn't point
any more to these email aliases. Restarted Nagios. Still
notifications were sent.
All in all I got the impression that Nagios does not care too much
about changed configurations when getting restarted. But then I can't
swear that I didn't screw it up somehow, since I was pretty much tied
up in the downtime and hadn't a lot of time playing with Nagios at the
same time.
Is there anyone who could make sense of this, and preferably have a
solution how I get rid of that phantom check?
Oh, and another thought: I guess there's no possibility to tell Nagios
to "condense" notifications? I mean, in a situation like yesterday it
would be handy to have one notificaton for all incidents insteead of
~150 mails. Something like "upon a failure wait x minutes before
sending a notification, if there's another failure include it into the
notification and wait another x minutes. But don't wait longer than
y(>x) minutes counted from the first failure on" would be really
cool...
Regards,
Rasmus
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
More information about the Users
mailing list