Information Required
Lewis Getschel
lgetschel at denver.westerngeco.slb.com
Fri Jan 28 20:51:47 CET 2005
Another user wrote : "You should be looking at SNMP tools from you
hardware vendor...Nagios ain't for this kinda monitoring activity"
Which is basically right. However, since I WANTED Nagios to monitor it
for ME, I wrote a 'plugin' for myself to check the DELL Windows Servers
external PERC disk arrays.
As this other user said, SNMP was needed, so it runs from my Nagios
server, and checks my Windows systems. (I'm working on a version to
check my Linux systems too, eventually. I'm having to learn much more
about snmpd than I wanted <smile>).
[If anyone wants to use it on other HW, you'll have to find the proper
OIDs in the MIBs for your hardware, it gets THAT specific!]
Remember, I wrote this for myself, so I'm leaving all my internal
comments, etc intact here, some may prove useful if anyone finds this
script helpful.
Lewis
----- cut here -----
#!/bin/bash
#-x
# Script to check the Windows Dell-PERC for current status
#
# Written by: Lewis Getschel
# Date: 12/29/04
# Parameters: 1 - the ip address of the system to check
# Operation:
# Limitiation: It seems that Nagios will NOT run a /bin/tcsh script at
all!!
# I had to change the script to /bin/sh (bash) to get it
to even run a 3 line script
# that was just echo $1 into the /tmp/file.
# according to Nagios Plugin recommendations, I tried to
use absolute
# paths to all commands
# #
# Version History:
# 12/29/2004 First try, Turned out VERY good. Keeping a temp file
seemed the best way to go on this.
# This allows seeing changes. I initially didn't show the
number of Global/Dedicated
# HotSpares, but after a few minutes of monitoring, I
realized that since each of the 15 servers
# "at-that-time-purchased" group had different standards
for how they were configured
# I needed to see the actual numbers of spares
#
# Notes: The "baseline" (the temp file) is never actually
replaced anywhere in this code. If
# a new baseline is desired, then simply delete the
appropriate temp file. This routine
# will create a NEW baseline (/tmp) file, and use that onward.
#
# Example:
# Using the oid for the perc adapter (from arymgr.mib)
# This retrieves the "Disks name as represented in Array Manager"
# /usr/bin/snmpget -v1 -c public m010:161
1.3.6.1.4.1.674.10893.1.1.140.2.1.2.1
# SNMPv2-SMI::enterprises.674.10893.1.1.140.2.1.2.1 = STRING: "Disk 2"
# dvws001(dets05)12/29 10:23 /usr/lib/nagios/plugins> /usr/bin/snmpget
-v1 -c public m010:161 1.3.6.1.4.1.674.10893.1.1.140.2.1.2.2
# SNMPv2-SMI::enterprises.674.10893.1.1.140.2.1.2.2 = STRING: "Disk 0"
# dvws001(dets05)12/29 10:23 /usr/lib/nagios/plugins> /usr/bin/snmpget
-v1 -c public m010:161 1.3.6.1.4.1.674.10893.1.1.140.2.1.2.3
# SNMPv2-SMI::enterprises.674.10893.1.1.140.2.1.2.3 = STRING: "Disk 1"
#
# Useful OID's (as _I_ see it <smile>)
# 1.3.6.1.4.1.674.10893.1.1.130.1.1.5.x "Status of this controllers
subsystem (which includes any devices connected to it"
# Problem that I see, shows status at THAT moment, shows
6:Degraded while rebuild occurs
# otherwise it shows 1:Ready (before rebuild, and after rebuild)
#
# 1.3.6.1.4.1.674.10893.1.1.110.1.0 "Global health information for
the subsystem"
# 1.3.6.1.4.1.674.10893.1.1.110.2.0 "Previous Global health
information for the subsystem"
# Problem is that I don't know how previous it is (seems to be
until rebooted, because on M010
# It showed 2:Warning until I rebooted and ran Diags, then it
showed 1:Normal
#
# I've been thinking that if I keep an array of integers (i.e.
"222222222222222222222222222223")
# that represent the current "status of the array disk as a spare"
# 1.3.6.1.4.1.674.10893.1.1.130.4.1.22.x (1-30). This way I can tell:
# 1) whether HotSpares are in correct place
# 2) when they change positions
# Problem is that I'd need to write this out somewhere to keep for
compares (use OID
# 1.3.6.1.4.1.674.10893.1.1.130.3.1.7.x (1-3) "enclosure
ID (i.e. serial Number)
# Problem is that the internal enclosure is "Null"
#
# 1.3.6.1.4.1.674.10893.1.1.140.2.1.4.x (1-3) "Current state of the Disk"
# 1.3.6.1.4.1.674.10893.1.1.130.4.1.4.x (1-30) "Current state of the
(individual) array disk"
#
# This next one is the 1st of several OID's that show the disk name
# 1.3.6.1.4.1.674.10893.1.1.130.4.1.2.x (1-30) "Name of the array disk
represented in Array Manager"
#
# =================================== Script starts below
================================
#
hostnam=$1
# echo $1 >> /tmp/nagios_event_debug.txt
# echo --- `date` --- >> /tmp/nagios_event_debug.txt
if [ "$#" -eq "0" ]; then
echo "Unknown - No parameter specified"
exit 3
fi
currentsystemstatus=`/usr/bin/snmpget -v1 -c public $hostnam\:161
1.3.6.1.4.1.674.10893.1.1.130.1.1.5.2 | awk '{print $NF}'`
previoussystemstatus=`/usr/bin/snmpget -v1 -c public $hostnam\:161
1.3.6.1.4.1.674.10893.1.1.130.1.1.5.2 | awk '{print $NF}'`
total_drives=`snmpwalk -c public -v 1 $hostnam
1.3.6.1.4.1.674.10893.1.1.130.4.1.1 | tail -1 | awk '{print $NF}'`
for ((a=1; a <= total_drives ; a++)) # Double parentheses, and
"total_drives" with no "$".
do
current_disks_state[${a}]=`/usr/bin/snmpget -v1 -c public
$hostnam\:161 1.3.6.1.4.1.674.10893.1.1.130.4.1.4.${a} | awk '{print $NF}'`
done # A construct borrowed from 'ksh93'.
# current_disks_state=`snmpwalk -c public -v 1 $hostnam
1.3.6.1.4.1.674.10893.1.1.130.4.1.4 | awk '{print $NF}'`
system_serial_number=`snmpwalk -v 1 -c public $hostnam
.1.3.6.1.4.1.674.10892.1.300.10.1.11 | awk '{print $NF}' | sed 's/\"//g'`
# === if there is a previousdata file for previous run, read it in.
if [ -e /tmp/${hostnam}_$system_serial_number.txt ]; then
for ((a=1; a <= total_drives ; a++)) # Double parentheses, and
"total_drives" with no "$".
do
previous_disks_state[${a}]=`/bin/sed -ne ${a}p
/tmp/${hostnam}_$system_serial_number.txt`
done
previousdata=1
else # no previous file data, make it now from current (or should I make
it manually as 4 3 3 3 1 ..??)
currentdrive=1
previousdata=0
/bin/touch /tmp/${hostnam}_$system_serial_number.txt
while [ $currentdrive -le $total_drives ]
do
echo ${current_disks_state[$currentdrive]} >>
/tmp/${hostnam}_$system_serial_number.txt
currentdrive=`expr $currentdrive + 1`
done
echo "WARNING - PERC array wrote first status file on dvws001
/tmp/${hostnam}_$system_serial_number"
exit 1
fi
# =========== If current status = previous status then it's OK
===================
if [ $currentsystemstatus -eq $previoussystemstatus ]; then
totalhotspares=`/usr/bin/snmpwalk -c public -v 1 $hostnam
1.3.6.1.4.1.674.10893.1.1.130.4.1.22 | awk '{print $NF}'| awk '/3/ {++x}
END {print x}'`
totaldedicatedspares=`/usr/bin/snmpwalk -c public -v 1 $hostnam
1.3.6.1.4.1.674.10893.1.1.130.4.1.22 | awk '{print $NF}'| awk 'BEGIN
{x=0} /4/ {++x} END {pri
nt x}'`
echo "OK - PERC Array Status, Global HotSpares=$totalhotspares,
DedicatedSpares=$totaldedicatedspares"
exit 0
fi
# ========= If current status != previous status then it's Broken,
figure out where =============
# except for the FIRST time this script runs, this code only runs
because of a mismatch in states
# it seems safe to assume that I should check each array position for
where the problem is.
currentdrive=1
while [ $currentdrive -le $total_drives ]
do
if [ ${current_disks_state[$currentdrive]} -ne
${previous_disks_state[$currentdrive]} ]; then
# @ currentdrive = $currentdrive + 1
#else # HERE is where they differ
echo -n `/usr/bin/snmpget -v1 -c public $hostnam\:161
1.3.6.1.4.1.674.10893.1.1.130.4.1.2.$currentdrive | awk -F\" '{print
$2}'`" "
case "${current_disks_state[$currentdrive]}" in
"0" )
echo -n "Unknown";;
"1" )
echo -n "Ready"
case "`/usr/bin/snmpget -v1 -c public $hostnam\:161
1.3.6.1.4.1.674.10893.1.1.130.4.1.22.$currentdrive | awk '{print $NF}'`" in
"1" )
echo -n "-member of virtual disk";;
"2" )
echo -n "-member of disk group";;
"3" )
echo -n "-global hot spare";;
"4" )
echo -n "-dedicated hot spare";;
* )
echo -n "Bad_ERROR_Code";;
esac;;
"2" )
echo -n "Failed";;
"3" )
echo -n "Online";;
"4" )
echo -n "Offline";;
"6" )
echo -n "Degraded";;
"7" )
echo -n "Recovering";;
"11" )
echo -n "Removed";;
"15" )
echo -n "Resyncing";;
"24" )
echo -n "Rebuild";;
"25" )
echo -n "No Media";;
"26" )
echo -n "Formatting";;
"28" )
echo -n "Diagnostics";;
"35" )
echo -n "Initializing";;
* )
echo -n "Bad_ERROR_Code";;
esac
echo -n " was: "
case "${previous_disks_state[$currentdrive]}" in
"0" )
echo -n "Unknown";;
"1" )
echo -n "Ready";;
"2" )
echo -n "Failed";;
"3" )
echo -n "Online";;
"4" )
echo -n "Offline";;
"6" )
echo -n "Degraded";;
"7" )
echo -n "Recovering";;
"11" )
echo -n "Removed";;
"15" )
echo -n "Resyncing";;
"24" )
echo -n "Rebuild";;
"25" )
echo -n "No Media";;
"26" )
echo -n "Formatting";;
"28" )
echo -n "Diagnostics";;
"35" )
echo -n "Initializing";;
* )
echo -n "Bad_ERROR_Code";;
esac
fi
currentdrive=`expr $currentdrive + 1`
done
echo ""
exit 2
# echo values for debug/sanity
#echo CurrentSystemStatus = $currentsystemstatus
#echo PreviousSystemStatus = $previoussystemstatus
#echo Total Disk Drives = $total_drives
#echo Total Virtual Disks = $total_virtual_disks
#echo current_ disks_state = $current_disks_state
#echo current_disks_state-14 = $current_disks_state[14]
#echo system_serial_number = $system_serial_number
#echo previous_disks_state = $previous_disks_state
----- cut here -----
Saiprasad Sanzgiri wrote:
>Hi,
>
>Need to know whether does Nagios monitor the RAID of a server. If Yes,
>what are the parameters and how do I do it ?
>
>Would await someone's kind reply.
>
>Regards,
>Sai
>
>
-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list