More on notifications and reboot monitoring
Andreas Ericsson
ae at op5.se
Fri Jan 7 14:37:33 CET 2005
Carson Gaspar wrote:
> [ Resending from correct From address ]
>
> --On Thursday, January 06, 2005 2:51 PM +0100 Andreas Ericsson
> <ae at op5.se> wrote:
>
>> Ehrm. The idea of scheduled downtime is to do this sort of thing. If you
>> want to add a script submitting a 5 minute (or something) downtime
>> whenever you run reboot, then by all means feel free. If you make it
>> clean I'm sure lots of other users would be interested. I don't think
>> it's a very good idea to keep that logic in the Nagios daemon though, as
>> it can never possibly guess if a host has been shut down or crashed, so I
>> don't quite see the point of this email. Care to clarify?
>
>
> I'll try again (3rd time lucky? ;-) ).
>
> We need:
>
> - Alarms when machines reboot unexpectedly
> - Alarms when machines fail to come back after a reboot
> - No alarms during normal scheduled reboots
>
> Scheduled downtime is great, except for one thing - if any alarms are
> received during scheduled downtime, no notifications go out. Ever.
This is a bug or a missing feature. It will be fixed.
> Even
> after downtime has ended. This is a result of the design decision to
> only see if notifications are required when receiving a new check
> result.
Nagios handles the host when it comes out of scheduled downtime, so
there's no real reason it shouldn't check what the status was prior to
downtime and match against current upon a host exiting. It's a minor
change, and shouldn't be too hard to add.
> As the only "pull" monitor in my environment is Ping, it's the
> only thing I can safely schedule downtime against (ignoring freshness
> checks for now). This is only really an issue when trying to get a
> "failed to reboot" alarm. I finally gave up, and just have the Ping
> service alarm if a reboot fails (as opposed to a more specific alarm).
>
> If you re-read my previous message, the only logic on the central nagios
> server is some basic dependency logic to prevent false alarms - all the
> work is done on the client in an init script (which submits passive
> check results and schedules downtime via an in-house queueing agent to
> Nagios' named pipe). It does work, I was just asking for opinions about
> it (as it seems a bit complex for my tastes).
>
It was unclear to me that you were simply asking the opinion, which is
why I responded the way I did. As for my opinion; Whatever works.
> And yes, I fully understand freshness checks - they're wonderful for
> continuously monitored services, but don't really work for reboots
> (unless you have your agent constantly send "Reboot OK" status msgs
> while the machine is up), as they are hopefully rare events ;-)
>
Why not simply set a higher max_check_attempts or retry_interval for the
ping services? That way you'll get soft down when the machine is
actually down, but no alerts will go out.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Lead Developer
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list