SMART hard-disk monitoring
Derek Olsen
derek.olsen at qsent.com
Thu Aug 31 21:27:07 CEST 2006
Andy.
The output if their is a problem will look like this. The
notification will only include the device that is in a down state.
***** Nagios *****
Notification Type: PROBLEM
Service: DiskDrives
Host: the.name.of.host
Address: the.name.of.host
State: CRITICAL
Date/Time: Thu Aug 17 09:55:02 PDT 2006
Documentation: https://where.the.docs.be
Additional Info:
DOWN=(/dev/sdg)
I believe this plugin can only detect when a drive is down and wont do
much for predicting when a failure is going to happen soon.
Hope this helps.
Deet.
> Hi Deet,
>
> Thanks very much for this script, had to do a minor touch of hacking,
> but it also proves your script will work on SATA drives as well (at
> least those SATA drives that Linux emulates as SCSI.)
>
> All I've touched is:
> my $scsi_disks = `/usr/bin/sudo /sbin/sfdisk -s |/bin/grep -i
> sd[a-z] |/bin/cut -f1 -d:`;
>
> /usr/bin/grep and /usr/bin/cut are in /bin/grep and /bin/cut on my
> system (Fedora 5.)
>
> $val = `/usr/bin/sudo /usr/sbin/smartctl -d ata -s on $drive &>
> /dev/null || /bin/echo MISSING`;
>
> In the above line I had to add the "-d ata" argument to smartctl to
> read the SATA drives as ATA drives, not SCSIs.
>
> The script outputs "UP=(/dev/sda /dev/sdb)".
>
> Can I just ask what the criteria is for the script to class a drive as
> failed/failing according to SMART?
>
> Many thanks again for sharing, it's extremely helpful!
>
> Regards
>
> Andy.
>
> PS. I couldn't reply to the list as I've got a problem with my DNS
> server, and Sourceforge's server is bouncing any mail I send :( If
> you could post what I've done to get SATA drives working, it may come
> in handy for somebody too.
>
> ---
>
> Derek Olsen wrote:
>>
>> Andy.
>> I've attached the check_smart we use. I think it's a barely modified
>> version of the one that comes with the nagios plugins. In the
>> script we use the output of /sbin/sfdisk -s to find out which scsi
>> disks are on the local box because we ran into problems using the
>> output of scsiinfo. So our sudoers file is configured to allow the
>> nagios user to run /sbin/sfisk -s and /usr/sbin/smartctl.
>>
>> This works for us. Hope it helps.
>> Deet.
>>> Has anyone got a check plugin working for monitoring SMART hard disk
>>> status thresholds?
>>>
>>> The only one I found on nagiosexchange (check_smartmon) needs to be
>>> run as root to get permission to read the drive stats, and also
>>> doesn't work - it causes the below Python trace-back:
>>>
>>> Traceback (most recent call last):
>>> File "./check_smartmon", line 254, in ?
>>> (healthStatus, temperature) = parseOutput(healthStatusOutput,
>>> temperatureOutput)
>>> File "./check_smartmon", line 163, in parseOutput
>>> healthStatus = parts[-1]
>>> IndexError: list index out of range
>>>
>>>
>>> I've just ran smartctl and it appears you do need to be root, so if
>>> I can find a working plugin I can just sudo the nagios user.
>>>
>>> Any ideas?
>>>
>>> Thanks
>>>
>>> Andy.
>>>
>>> -------------------------------------------------------------------------
>>>
>>> Using Tomcat but need to do more? Need to support web services,
>>> security?
>>> Get stuff done quickly with pre-integrated technology to make your
>>> job easier
>>> Download IBM WebSphere Application Server v.1.0.1 based on Apache
>>> Geronimo
>>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>>>
>>> _______________________________________________
>>> Nagios-users mailing list
>>> Nagios-users at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>>> ::: Please include Nagios version, plugin version (-v) and OS when
>>> reporting any issue. ::: Messages without supporting info will risk
>>> being sent to /dev/null
>>>
>>
>>
>>
>> !DSPAM:37,44f71ed4143297115289336!
>> ------------------------------------------------------------------------
>>
>> #!/usr/bin/perl -w
>>
>> #
>> # This script checks the hard drives on a system for S.M.A.R.T. health
>> # indicators. Only supports SCSI right now.
>> #
>> #
>> use strict;
>>
>> my $debug = 0;
>> my @disk_up;
>> my @disk_down;
>> my @disks;
>> my $scsi_disks = `/usr/bin/sudo /sbin/sfdisk -s |/usr/bin/grep -i
>> sd[a-z] |/usr/bin/cut -f1 -d:`;
>>
>> push @disks, split(' ', $scsi_disks);
>>
>> unless ( scalar @disks ) {
>> print "0 No disks to monitor\n";
>> exit 0;
>> }
>>
>> print "Monitoring: @disks\n" if $debug;
>>
>> for ( @disks ) {
>> my $drive = $_;
>> if($drive =~ /\/dev\/sd/) {
>> my $val;
>>
>> $val = `/usr/bin/sudo /usr/sbin/smartctl -s on $drive &>
>> /dev/null || /bin/echo MISSING`;
>> if ( $val eq "MISSING\n" ) {
>> push @disk_down, $drive;
>> next;
>> }
>>
>> $val = `/usr/bin/sudo /usr/sbin/smartctl -H $drive`;
>> if ( $val =~ /SMART Health Status\: OK/g ) {
>> print "$_ is OK\n" if $debug;
>> push @disk_up, $drive;
>> } else {
>> print "$_ is BAD\n" if $debug;
>> push @disk_down, $drive;
>> }
>> }
>> }
>>
>> my $ret = 0; # OK
>> if ( scalar @disk_down ) {
>> print "DOWN=(@disk_down)\n";
>> exit 2;
>> }
>> print "UP=(@disk_up) " if ( scalar @disk_up );
>> print "DOWN=(@disk_down) " if ( scalar @disk_down );
>> print "\n";
>>
>> exit 0;
>>
>>
>> !DSPAM:37,44f71ed4143297115289336!
>>
>
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list