Information Required

Lewis Getschel lgetschel at denver.westerngeco.slb.com
Fri Jan 28 20:51:47 CET 2005
Previous message: Information Required
Next message: Information Required
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Another user wrote : "You should be looking at SNMP tools from you 
hardware vendor...Nagios ain't for this kinda monitoring activity"

Which is basically right. However, since I WANTED Nagios to monitor it 
for ME,  I wrote a 'plugin' for myself to check the DELL Windows Servers 
external PERC disk arrays.

As this other user said, SNMP was needed, so it runs from my Nagios 
server, and checks my Windows systems. (I'm working on a version to 
check my Linux systems too, eventually. I'm having to learn much more 
about snmpd than I wanted <smile>).
[If anyone wants to use it on other HW, you'll have to find the proper 
OIDs in the MIBs for your hardware, it gets THAT specific!]

Remember, I wrote this for myself, so I'm leaving all my internal 
comments, etc intact here, some may prove useful if anyone finds this 
script helpful.
Lewis


----- cut here -----
#!/bin/bash 
#-x
# Script to check the Windows Dell-PERC for current status
#              
# Written by:   Lewis Getschel
# Date:         12/29/04
# Parameters:   1 - the ip address of the system to check
# Operation:   
# Limitiation:  It seems that Nagios will NOT run a /bin/tcsh script at 
all!!
#               I had to change the script to /bin/sh (bash) to get it 
to even run a 3 line script
#               that was just echo $1 into the /tmp/file.
#               according to Nagios Plugin recommendations, I tried to 
use absolute
#               paths to all commands
#               #
# Version History:
# 12/29/2004   First try, Turned out VERY good. Keeping a temp file 
seemed the best way to go on this.
#               This allows seeing changes. I initially didn't show the 
number of Global/Dedicated
#               HotSpares, but after a few minutes of monitoring, I 
realized that since each of the 15 servers
#               "at-that-time-purchased" group had different standards 
for how they were configured
#               I needed to see the actual numbers of spares
#
# Notes:        The "baseline" (the temp file) is never actually 
replaced anywhere in this code. If
#               a new baseline is desired, then simply delete the 
appropriate temp file. This routine
#               will create a NEW baseline (/tmp) file, and use that onward.
#
#  Example:
#  Using the oid for the perc adapter (from arymgr.mib) 
#  This retrieves the "Disks name as represented in Array Manager"
#   /usr/bin/snmpget -v1 -c public m010:161 
1.3.6.1.4.1.674.10893.1.1.140.2.1.2.1
# SNMPv2-SMI::enterprises.674.10893.1.1.140.2.1.2.1 = STRING: "Disk 2"
#   dvws001(dets05)12/29 10:23 /usr/lib/nagios/plugins> /usr/bin/snmpget 
-v1 -c public m010:161 1.3.6.1.4.1.674.10893.1.1.140.2.1.2.2
# SNMPv2-SMI::enterprises.674.10893.1.1.140.2.1.2.2 = STRING: "Disk 0"
#   dvws001(dets05)12/29 10:23 /usr/lib/nagios/plugins> /usr/bin/snmpget 
-v1 -c public m010:161 1.3.6.1.4.1.674.10893.1.1.140.2.1.2.3
# SNMPv2-SMI::enterprises.674.10893.1.1.140.2.1.2.3 = STRING: "Disk 1"
#
# Useful OID's (as _I_ see it <smile>)
#    1.3.6.1.4.1.674.10893.1.1.130.1.1.5.x "Status of this controllers 
subsystem (which includes any devices connected to it"
#       Problem that I see, shows status at THAT moment, shows 
6:Degraded while rebuild occurs
#       otherwise it shows 1:Ready (before rebuild, and after rebuild)
#
#    1.3.6.1.4.1.674.10893.1.1.110.1.0 "Global health information for 
the subsystem"
#    1.3.6.1.4.1.674.10893.1.1.110.2.0 "Previous Global health 
information for the subsystem"
#       Problem is that I don't know how previous it is (seems to be 
until rebooted, because on M010
#       It showed 2:Warning until I rebooted and ran Diags, then it 
showed 1:Normal
#
# I've been thinking that if I keep an array of integers (i.e. 
"222222222222222222222222222223")
# that represent the current "status of the array disk as a spare"
# 1.3.6.1.4.1.674.10893.1.1.130.4.1.22.x (1-30). This way I can tell:
#       1) whether HotSpares are in correct place
#       2) when they change positions
#       Problem is that I'd need to write this out somewhere to keep for 
compares (use OID
#               1.3.6.1.4.1.674.10893.1.1.130.3.1.7.x (1-3) "enclosure 
ID (i.e. serial Number)
#       Problem is that the internal enclosure is "Null"
#
# 1.3.6.1.4.1.674.10893.1.1.140.2.1.4.x (1-3) "Current state of the Disk"
# 1.3.6.1.4.1.674.10893.1.1.130.4.1.4.x (1-30) "Current state of the 
(individual) array disk"
#
# This next one is the 1st of several OID's that show the disk name
# 1.3.6.1.4.1.674.10893.1.1.130.4.1.2.x (1-30) "Name of the array disk 
represented in Array Manager"
#
# =================================== Script starts below 
================================
#
hostnam=$1
# echo $1 >> /tmp/nagios_event_debug.txt
# echo --- `date` --- >> /tmp/nagios_event_debug.txt

if [ "$#" -eq "0" ]; then
   echo "Unknown - No parameter specified"
   exit 3
fi

currentsystemstatus=`/usr/bin/snmpget -v1 -c public $hostnam\:161 
1.3.6.1.4.1.674.10893.1.1.130.1.1.5.2 | awk '{print $NF}'`
previoussystemstatus=`/usr/bin/snmpget -v1 -c public $hostnam\:161 
1.3.6.1.4.1.674.10893.1.1.130.1.1.5.2 | awk '{print $NF}'`
total_drives=`snmpwalk -c public -v 1 $hostnam 
1.3.6.1.4.1.674.10893.1.1.130.4.1.1 | tail -1 | awk '{print $NF}'`

for ((a=1; a <= total_drives ; a++))  # Double parentheses, and 
"total_drives" with no "$".
do
   current_disks_state[${a}]=`/usr/bin/snmpget -v1 -c public 
$hostnam\:161 1.3.6.1.4.1.674.10893.1.1.130.4.1.4.${a} | awk '{print $NF}'`
done                           # A construct borrowed from 'ksh93'.
# current_disks_state=`snmpwalk -c public -v 1 $hostnam 
1.3.6.1.4.1.674.10893.1.1.130.4.1.4 | awk '{print $NF}'`
system_serial_number=`snmpwalk -v 1 -c  public $hostnam 
.1.3.6.1.4.1.674.10892.1.300.10.1.11 | awk '{print $NF}' | sed 's/\"//g'`

# === if there is a previousdata file for previous run, read it in.
if [ -e /tmp/${hostnam}_$system_serial_number.txt ]; then
   for ((a=1; a <= total_drives ; a++))  # Double parentheses, and 
"total_drives" with no "$".
   do
      previous_disks_state[${a}]=`/bin/sed -ne ${a}p 
/tmp/${hostnam}_$system_serial_number.txt`
   done
   previousdata=1
else # no previous file data, make it now from current (or should I make 
it manually as 4 3 3 3 1 ..??)
   currentdrive=1
   previousdata=0
   /bin/touch /tmp/${hostnam}_$system_serial_number.txt
   while [ $currentdrive -le $total_drives ]
   do
      echo ${current_disks_state[$currentdrive]} >> 
/tmp/${hostnam}_$system_serial_number.txt
      currentdrive=`expr $currentdrive + 1`
   done
   echo "WARNING - PERC array wrote first status file on dvws001 
/tmp/${hostnam}_$system_serial_number"
   exit 1
fi

# =========== If current status = previous status then it's OK 
===================
if [ $currentsystemstatus -eq $previoussystemstatus ]; then
   totalhotspares=`/usr/bin/snmpwalk -c public -v 1 $hostnam 
1.3.6.1.4.1.674.10893.1.1.130.4.1.22 | awk '{print $NF}'| awk '/3/ {++x} 
END {print x}'`
   totaldedicatedspares=`/usr/bin/snmpwalk -c public -v 1 $hostnam 
1.3.6.1.4.1.674.10893.1.1.130.4.1.22 | awk '{print $NF}'| awk 'BEGIN 
{x=0} /4/ {++x} END {pri
nt x}'`
   echo "OK - PERC Array Status, Global HotSpares=$totalhotspares, 
DedicatedSpares=$totaldedicatedspares"
   exit 0
fi
# ========= If current status != previous status then it's Broken, 
figure out where =============
# except for the FIRST time this script runs, this code only runs 
because of a mismatch in states
# it seems safe to assume that I should check each array position for 
where the problem is.
currentdrive=1
while [ $currentdrive -le $total_drives ]
do
   if [ ${current_disks_state[$currentdrive]} -ne 
${previous_disks_state[$currentdrive]} ]; then
     # @ currentdrive = $currentdrive + 1
   #else   # HERE is where they differ
      echo -n `/usr/bin/snmpget -v1 -c public $hostnam\:161 
1.3.6.1.4.1.674.10893.1.1.130.4.1.2.$currentdrive | awk -F\" '{print 
$2}'`" "
      case "${current_disks_state[$currentdrive]}" in
         "0" )
            echo -n "Unknown";;
         "1" )
            echo -n "Ready"
            case "`/usr/bin/snmpget -v1 -c public $hostnam\:161 
1.3.6.1.4.1.674.10893.1.1.130.4.1.22.$currentdrive | awk '{print $NF}'`" in
               "1" )
                  echo -n "-member of virtual disk";;
               "2" )
                  echo -n "-member of disk group";;
               "3" )
                  echo -n "-global hot spare";;
               "4" )
                  echo -n "-dedicated hot spare";;
                * )
                  echo -n "Bad_ERROR_Code";;
            esac;;
         "2" )
            echo -n "Failed";;
         "3" )
            echo -n "Online";;
         "4" )
            echo -n "Offline";;
         "6" )
            echo -n "Degraded";;
         "7" )
            echo -n "Recovering";;
         "11" )
            echo -n "Removed";;
         "15" )
            echo -n "Resyncing";;
         "24" )
            echo -n "Rebuild";;
         "25" )
            echo -n "No Media";;
         "26" )
            echo -n "Formatting";;
         "28" )
            echo -n "Diagnostics";;
         "35" )
            echo -n "Initializing";;
         * )
            echo -n "Bad_ERROR_Code";;
      esac
      echo -n " was: "
      case "${previous_disks_state[$currentdrive]}" in
         "0" )
            echo -n "Unknown";;
         "1" )
            echo -n "Ready";;
         "2" )
            echo -n "Failed";;
         "3" )
            echo -n "Online";;
         "4" )
            echo -n "Offline";;
         "6" )
            echo -n "Degraded";;
         "7" )
            echo -n "Recovering";;
         "11" )
            echo -n "Removed";;
         "15" )
            echo -n "Resyncing";;
         "24" )
            echo -n "Rebuild";;
         "25" )
            echo -n "No Media";;
         "26" )
            echo -n "Formatting";;
         "28" )
            echo -n "Diagnostics";;
         "35" )
            echo -n "Initializing";;
         * )
            echo -n "Bad_ERROR_Code";;
      esac
   fi
   currentdrive=`expr $currentdrive + 1`
done
echo ""
exit 2

# echo values for debug/sanity
#echo CurrentSystemStatus = $currentsystemstatus
#echo PreviousSystemStatus = $previoussystemstatus
#echo Total Disk Drives = $total_drives
#echo Total Virtual Disks = $total_virtual_disks
#echo current_ disks_state  = $current_disks_state
#echo current_disks_state-14 = $current_disks_state[14]
#echo system_serial_number = $system_serial_number
#echo previous_disks_state = $previous_disks_state
----- cut here -----



Saiprasad Sanzgiri wrote:

>Hi,
>
>Need to know whether does Nagios monitor the RAID of a server. If Yes,
>what are the parameters and how do I do it ?
>
>Would await someone's kind reply.
>
>Regards,
>Sai
>  
>



-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
Previous message: Information Required
Next message: Information Required
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list