More on notifications and reboot monitoring
Carson Gaspar
carson+nagiosusers at taltos.org
Thu Jan 6 23:09:40 CET 2005
[ Resending from correct From address ]
--On Thursday, January 06, 2005 2:51 PM +0100 Andreas Ericsson <ae at op5.se>
wrote:
> Ehrm. The idea of scheduled downtime is to do this sort of thing. If you
> want to add a script submitting a 5 minute (or something) downtime
> whenever you run reboot, then by all means feel free. If you make it
> clean I'm sure lots of other users would be interested. I don't think
> it's a very good idea to keep that logic in the Nagios daemon though, as
> it can never possibly guess if a host has been shut down or crashed, so I
> don't quite see the point of this email. Care to clarify?
I'll try again (3rd time lucky? ;-) ).
We need:
- Alarms when machines reboot unexpectedly
- Alarms when machines fail to come back after a reboot
- No alarms during normal scheduled reboots
Scheduled downtime is great, except for one thing - if any alarms are
received during scheduled downtime, no notifications go out. Ever. Even
after downtime has ended. This is a result of the design decision to only
see if notifications are required when receiving a new check result. As the
only "pull" monitor in my environment is Ping, it's the only thing I can
safely schedule downtime against (ignoring freshness checks for now). This
is only really an issue when trying to get a "failed to reboot" alarm. I
finally gave up, and just have the Ping service alarm if a reboot fails (as
opposed to a more specific alarm).
If you re-read my previous message, the only logic on the central nagios
server is some basic dependency logic to prevent false alarms - all the
work is done on the client in an init script (which submits passive check
results and schedules downtime via an in-house queueing agent to Nagios'
named pipe). It does work, I was just asking for opinions about it (as it
seems a bit complex for my tastes).
And yes, I fully understand freshness checks - they're wonderful for
continuously monitored services, but don't really work for reboots (unless
you have your agent constantly send "Reboot OK" status msgs while the
machine is up), as they are hopefully rare events ;-)
--
Carson
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list