Phantom service checks

Marc Powell mpowell at ena.com
Thu Dec 5 14:39:30 CET 2002


Sounds to me like you have multiple nagios processes running on that machine. Use the init script to stop nagios then ps to verify that you do not have a copy of nagios still running. If you do not, try removing the status.sav file after stoppong nagios then restart.

As far as aggregating notifications, I believe there is something in the contrib directory or documentation to help you with that.


--
Marc

Sent from a very tiny wireless device with a very tiny unlit keyboard.


-----Original Message-----
From: Rasmus Plewe <rplewe at ess.nec.de>
To: Nagios users list <nagios-users at lists.sourceforge.net>
Sent: Thu Dec 05 07:14:12 2002
Subject: [Nagios-users] Phantom service checks

Hello,

the only thing I found about this issue was in the mailing list
archive from last week monday, but no response. 

Once upon a time I had a service check, which was associated with a
couple of hosts and hostgroups. Now I don't have this service any
more, even the command definition in checkcommands.cfg is deleted. 
When doing a recursive grep over the Nagios directory, the only files
where this service name appears are the log files. But every now and
again I get notifications telling me that this service is critical or
up (it being so unreliable was one of the reasons to eliminate it in
the first place). How can I get rid of this?

Another thing:
During a greater downtime yesterday night, I had the opportunity to
test the "scheduled downtime" functionality. What I think what
happened is the following:
- downtime started. Lots of mails were generated, about every host and
  service that was configured. 
- I scheduled downtime for the time being. Still notifications were
  sent out (yes, I restarted Nagios). 
- I removed certain email adresses (like "half of the company" - oops)
  from getting notifications by setting the *_notification_periods in
  contacts.cfg to "none". Restarted Nagios. Still notifications were
  sent. 
- I changed the email addresses in contacts so that they didn't point
  any more to these email aliases. Restarted Nagios. Still
  notifications were sent. 

All in all I got the impression that Nagios does not care too much
about changed configurations when getting restarted. But then I can't
swear that I didn't screw it up somehow, since I was pretty much tied
up in the downtime and hadn't a lot of time playing with Nagios at the
same time.

Is there anyone who could make sense of this, and preferably have a
solution how I get rid of that phantom check? 

Oh, and another thought: I guess there's no possibility to tell Nagios
to "condense" notifications? I mean, in a situation like yesterday it
would be handy to have one notificaton for all incidents insteead of
~150 mails. Something like "upon a failure wait x minutes before
sending a notification, if there's another failure include it into the
notification and wait another x minutes. But don't wait longer than
y(>x) minutes counted from the first failure on" would be really
cool... 


Regards,
         Rasmus


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20021205/319dde42/attachment.html>


More information about the Users mailing list