More on notifications and reboot monitoring
Ethan Galstad
nagios at nagios.org
Mon Jan 10 07:36:54 CET 2005
On 7 Jan 2005 at 14:37, Andreas Ericsson wrote:
> Carson Gaspar wrote:
> > [ Resending from correct From address ]
> >
> > --On Thursday, January 06, 2005 2:51 PM +0100 Andreas Ericsson
> > <ae at op5.se> wrote:
> >
> >> Ehrm. The idea of scheduled downtime is to do this sort of thing.
> >> If you want to add a script submitting a 5 minute (or something)
> >> downtime whenever you run reboot, then by all means feel free. If
> >> you make it clean I'm sure lots of other users would be interested.
> >> I don't think it's a very good idea to keep that logic in the
> >> Nagios daemon though, as it can never possibly guess if a host has
> >> been shut down or crashed, so I don't quite see the point of this
> >> email. Care to clarify?
> >
> >
> > I'll try again (3rd time lucky? ;-) ).
> >
> > We need:
> >
> > - Alarms when machines reboot unexpectedly
> > - Alarms when machines fail to come back after a reboot
> > - No alarms during normal scheduled reboots
> >
> > Scheduled downtime is great, except for one thing - if any alarms
> > are received during scheduled downtime, no notifications go out.
> > Ever.
>
> This is a bug or a missing feature. It will be fixed.
>
> > Even
> > after downtime has ended. This is a result of the design decision to
> > only see if notifications are required when receiving a new check
> > result.
>
> Nagios handles the host when it comes out of scheduled downtime, so
> there's no real reason it shouldn't check what the status was prior to
> downtime and match against current upon a host exiting. It's a minor
> change, and shouldn't be too hard to add.
This isn't really a bug that you want to fix, as it will cause a lot
of not-so-great side effects. When you schedule downtime for a host,
anything that happens during that time is fair game and is ignored
for purposes of notification (that's why its in downtime). When
downtime ends, Nagios will not notify about a problem that happened
during that downtime - that's what downtime was for. If the problem
continues after downtime (i.e. an active check returns a problem),
then a notification can occur.
>
> > As the only "pull" monitor in my environment is Ping, it's the only
> > thing I can safely schedule downtime against (ignoring freshness
> > checks for now). This is only really an issue when trying to get a
> > "failed to reboot" alarm. I finally gave up, and just have the Ping
> > service alarm if a reboot fails (as opposed to a more specific
> > alarm).
> >
> > If you re-read my previous message, the only logic on the central
> > nagios server is some basic dependency logic to prevent false alarms
> > - all the work is done on the client in an init script (which
> > submits passive check results and schedules downtime via an in-house
> > queueing agent to Nagios' named pipe). It does work, I was just
> > asking for opinions about it (as it seems a bit complex for my
> > tastes).
> >
>
> It was unclear to me that you were simply asking the opinion, which is
> why I responded the way I did. As for my opinion; Whatever works.
>
> > And yes, I fully understand freshness checks - they're wonderful for
> > continuously monitored services, but don't really work for reboots
> > (unless you have your agent constantly send "Reboot OK" status msgs
> > while the machine is up), as they are hopefully rare events ;-)
> >
>
> Why not simply set a higher max_check_attempts or retry_interval for
> the ping services? That way you'll get soft down when the machine is
> actually down, but no alerts will go out.
>
I would use active checks as Andreas suggested for checking host
availability. Passive-only checks might be troublesome to implement
reliably.
Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list