Disk failures
Dan Stromberg
strombrg at dcs.nac.uci.edu
Sat Feb 5 01:34:02 CET 2005
On Fri, 2005-02-04 at 16:11 -0800, Jason Martin wrote:
> On Fri, Feb 04, 2005 at 04:07:08PM -0800, Edward Smith wrote:
> > Is it possible to setup nagios to detect disk failures? How
> > about getting the load on a cpu? Would something like MRTG be
> > better for this? Thanks.
> If there is a logfile, snmp mib, or command that can be accessed
> to determine that a disk has failed then yes. If it is via some
> command then you might have to write a special plugin for it.
>
> CPU load is monitorable by check_load, however if you want
> graphs over time then MRTG would be a good adjunct.
>
> -Jason Martin
Another option for load checking is to enable rpc.rstatd, and use the
following plugin:
#!/usr/bin/python
import sys
import os
import re
import string
host=sys.argv[1]
pipe=os.popen('/usr/bin/maxtime 10 /dcs/etc/rup '+host+' 2>&1','r')
line = pipe.readline()
#meter.eng up 154 days, 4:40, load average: 2.73 4.21
4.25
r = re.compile('^.*load average: ([0-9\.]*) ([0-9\.]*) ([0-9\.]*).*$')
m = r.match(line)
if not m:
print 'service unavailable'
sys.exit(2)
one_min = string.atof(m.group(1))
five_min = string.atof(m.group(2))
if one_min > 16.0 or five_min > 12.0:
print 'load critical:',m.group(1), m.group(2), m.group(3)
sys.exit(2)
if one_min > 12.0 or five_min > 8.0:
print 'load warning:',m.group(1), m.group(2), m.group(3)
sys.exit(1)
else:
print 'load:',m.group(1), m.group(2), m.group(3)
sys.exit(0)
maxtime is available from:
http://dcs.nac.uci.edu/~strombrg/maxtime.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <https://www.monitoring-lists.org/archive/users/attachments/20050204/c60e21b6/attachment.sig>
More information about the Users
mailing list