shutting down machines with Nagios

Eli Stair estair at ilm.com
Thu Jun 8 21:19:27 CEST 2006


Yep, I'd recommend having your event handler that fires on an overheat
condition correlate _several_ sources before shutting down large numbers of
systems.  If you look hard, you'll surely find a number of good sources for
temp correlation (netbotz, switch/router SNMP, management processors,
chiller, cooling towers, lm_sensors, etc).

Having a per-host shutdown based on local lm_sensors/management info is
usually fine (just beware of bugs in your temp reference...), i.e. If you're
checking CPU temps, check temp and fan status...  Large-scale cluster
power-off's are tender though, you may even want to avoid having that
handled automatically, and just have an easily-accessible method of doing a
room/datacenter manually from remote if you do correlate everything.

The action of turning something off is the easy part, it's determining that
you _really_ want to that's pointy.

IMO

/eli


On 6/8/06 11:34 AM, "Johnston Michael J Contr AFRL/DES"
<michael.johnston at kirtland.af.mil> wrote:

> 
> Does anyone use anything that will go out and shutdown computers in
> instances where a room is over heating or too many errors start occurring?
> We've recently had a problem with heat in a server room.  I got messages
> that the room was overheating, but by the time I got there the room was
> really hot and all the machines were running.  I'm looking for something
> that takes steps to save machines if a threshold is ever met or exceeded.
> 
> Thanks for the help!
> 
> 
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 



_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list