reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1
Sean McKell
mckell at us.ibm.com
Wed Jun 19 00:34:15 CEST 2013
> Do you have this in nagios.cfg?
> retain_state_information=1
yes, i do have that set
From: nagios-users-request at lists.sourceforge.net
To: nagios-users at lists.sourceforge.net,
Date: 06/18/2013 01:56 PM
Subject: Nagios-users Digest, Vol 85, Issue 6
Send Nagios-users mailing list submissions to
nagios-users at lists.sourceforge.net
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/nagios-users
or, via email, send a message with subject or body 'help' to
nagios-users-request at lists.sourceforge.net
You can reach the person managing the list at
nagios-users-owner at lists.sourceforge.net
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Nagios-users digest..."
Today's Topics:
1. reload appears to cause force of DOWN; SOFT; x to
DOWN; HARD;
1 (Sean McKell)
2. Re: reload appears to cause force of DOWN; SOFT; x to DOWN;
HARD; 1 (Travis Runyard)
3. Re: Issues with NEB modules breaking after restart
(Andrew Widdersheim)
4. Functions to do Availibility in reporting (omar saddiki)
5. Fwd: Functions to do Availibility in reporting (omar saddiki)
6. Wmi (martin Rodriguez)
7. Re: Wmi (Sunil Sankar)
8. check_ntp_time offset unknown (Bennett, Jan)
9. Re: check_ntp_time offset unknown (Holger Wei?)
10. Re: check_ntp_time offset unknown (Giles Coochey)
11. Problem with check_openmanage plugin and storage (Nic Bernstein)
----------------------------------------------------------------------
Message: 1
Date: Thu, 13 Jun 2013 17:31:44 -0600
From: Sean McKell <mckell at us.ibm.com>
Subject: [Nagios-users] reload appears to cause force of DOWN; SOFT; x
to DOWN; HARD; 1
To: nagios-users at lists.sourceforge.net
Message-ID:
<OF17CEA331.79DB0522-ON87257B89.0080C0E1-87257B89.0081405C at us.ibm.com>
Content-Type: text/plain; charset="us-ascii"
Running 3.4.1:
I see this strange anomaly, where a host check is in the middle of doing
retries before hitting max_attempts, but after a server reload occurs, the
next check is automatically forced to DOWN;HARD;1, as seen here:
[2013-06-04 08:40:21] HOST ALERT: 5gt4;DOWN;SOFT;1;CRITICAL: Connection
timed out to '' after 160 seconds (user 'chk'). Expected prompt not found.
Last output was ''.
[2013-06-04 08:47:18] HOST ALERT: 5gt4;DOWN;SOFT;2;CRITICAL: Connection
timed out to '' after 160 seconds (user 'chk'). Expected prompt not found.
Last output was ''.
[2013-06-04 08:54:03] HOST ALERT: 5gt4;DOWN;SOFT;3;CRITICAL: Connection
timed out to '' after 160 seconds (user 'chk'). Expected prompt not found.
Last output was ''.
(reload happens here)
[2013-06-04 09:00:52] HOST ALERT: 5gt4;DOWN;HARD;1;CRITICAL: Connection
timed out to '' after 160 seconds (user 'chk'). Expected prompt not found.
Last output was ''.
Why is it skipping the rest of the attempts and going straight to
DOWN;HARD after the reload ?
Seems like a bug to me.
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 2
Date: Thu, 13 Jun 2013 21:39:48 -0700
From: Travis Runyard <travisrunyard at gmail.com>
Subject: Re: [Nagios-users] reload appears to cause force of DOWN;
SOFT; x to DOWN; HARD; 1
To: Nagios Users List <nagios-users at lists.sourceforge.net>
Message-ID:
<CANCZ1yG6CYiE2GYL3j5W3Gj9WjrTz4SmGONnaZUxbL5piUB=zA at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Do you have this in nagios.cfg?
retain_state_information=1
On Thu, Jun 13, 2013 at 4:31 PM, Sean McKell <mckell at us.ibm.com> wrote:
> Running 3.4.1:
> I see this strange anomaly, where a host check is in the middle of doing
> retries before hitting max_attempts, but after a server reload occurs,
the
> next check is automatically forced to DOWN;HARD;1, as seen here:
>
> [2013-06-04 08:40:21] HOST ALERT: 5gt4;DOWN;SOFT;1;CRITICAL: Connection
> timed out to '' after 160 seconds (user 'chk'). Expected prompt not
found.
> Last output was ''.
> [2013-06-04 08:47:18] HOST ALERT: 5gt4;DOWN;SOFT;2;CRITICAL: Connection
> timed out to '' after 160 seconds (user 'chk'). Expected prompt not
found.
> Last output was ''.
> [2013-06-04 08:54:03] HOST ALERT: 5gt4;DOWN;SOFT;3;CRITICAL: Connection
> timed out to '' after 160 seconds (user 'chk'). Expected prompt not
found.
> Last output was ''.
> (reload happens here)
> [2013-06-04 09:00:52] HOST ALERT: 5gt4;DOWN;HARD;1;CRITICAL: Connection
> timed out to '' after 160 seconds (user 'chk'). Expected prompt not
found.
> Last output was ''.
>
> Why is it skipping the rest of the attempts and going straight to
> DOWN;HARD after the reload ?
> Seems like a bug to me.
>
>
>
------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 3
Date: Fri, 14 Jun 2013 13:03:56 -0400
From: Andrew Widdersheim <awiddersheim at hotmail.com>
Subject: Re: [Nagios-users] Issues with NEB modules breaking after
restart
To: "nagios-users at lists.sourceforge.net"
<nagios-users at lists.sourceforge.net>
Message-ID: <SNT143-W535DE68DF8CC060F587EF0DD800 at phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"
<div>To answer my own question... I'm pretty sure two nagios instances
were spawned at once. The nagios init script that comes with nagios-core
is the best at handling this situation.</div>
------------------------------
Message: 4
Date: Mon, 17 Jun 2013 15:21:37 +0000
From: omar saddiki <omar.saddiki at gmail.com>
Subject: [Nagios-users] Functions to do Availibility in reporting
To: Nagios Users List <nagios-users at lists.sourceforge.net>
Message-ID:
<CAN5T1CHYs_w4t0=muvDosc+KsjsLf5yW305X3-K1ZrkVtPNGgQ at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi,
Please, someone can give me the function used by Nagios in reporting
onglet
to extract the availibility between two times.
Regards
SADDIKI
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 5
Date: Mon, 17 Jun 2013 15:42:17 +0000
From: omar saddiki <omar.saddiki at gmail.com>
Subject: [Nagios-users] Fwd: Functions to do Availibility in reporting
To: Nagios Users List <nagios-users at lists.sourceforge.net>
Message-ID:
<CAN5T1CHOYvGnu8Z8Q_bbrtJe8A7=phdCNErWmN9cAjX59eU8wA at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi,
Please, someone can give me the function used by Nagios in reporting
onglet
to extract the availibility between two times.
Regards
SADDIKI
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 6
Date: Mon, 17 Jun 2013 15:14:24 -0300
From: martin Rodriguez <maestin at gmail.com>
Subject: [Nagios-users] Wmi
To: nagios-users at lists.sourceforge.net
Message-ID:
<CACrJBAsbWM8wVuPasjJQp0VumJZw5aj_qN6DGS+OHeZTMfmEXg at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi I am installing Nagios 3.4.3 on ubuntu and I can not configure the
plugin check_wmi_plus.conf someone had expereince in this topic
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 7
Date: Tue, 18 Jun 2013 00:14:07 +0530
From: Sunil Sankar <sunil at sunil.cc>
Subject: Re: [Nagios-users] Wmi
To: Nagios Users List <nagios-users at lists.sourceforge.net>
Message-ID:
<CAPqUM3W+mo5bRRoi6dxAwSdLPs87poqqQZHiJdQWVDh-7c5QhA at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
What is the error you are getting
On Mon, Jun 17, 2013 at 11:44 PM, martin Rodriguez
<maestin at gmail.com>wrote:
> Hi I am installing Nagios 3.4.3 on ubuntu and I can not configure the
> plugin check_wmi_plus.conf someone had expereince in this topic
>
>
>
------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
--
Regards
Sunil Sankar
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 8
Date: Fri, 14 Jun 2013 14:10:43 +0000
From: "Bennett, Jan" <JBennett at ntta.org>
Subject: [Nagios-users] check_ntp_time offset unknown
To: "'nagios-users at lists.sourceforge.net'"
<nagios-users at lists.sourceforge.net>
Message-ID:
<E11B0F59D3334D469B36FCA07490BA8C186E67EF at NTTAEXMB01.ntta.local>
Content-Type: text/plain; charset="us-ascii"
We have implemented a NTP sync check in all of the NRDS checks that we are
rolling out right now but I've run into a bit of a snag.
I am getting returns of 'Offset Unknown' on all clients. It appears to
only happen for a short period of time (30 min or so) and then it will
clear its self up for a bit but the issue will always return.
>From the client that is reporting the unknown offset, I can run the
following:
# ./check_ntp_time -H localhost
NTP CRITICAL: Offset unknown|
# ./check_ntp_time -V
check_ntp_time v1.4.16 (nagios-plugins 1.4.16)
# ntpdc -p
remote local st poll reach delay offset disp
=======================================================================
=LOCAL(0) 127.0.0.1 10 64 17 0.00000 0.000000 0.96858
*timeserver1 xxx.xxx.xxx.xxx 2 64 17 0.00098 4.956048 0.00580
# /usr/local/nagios/libexec/check_ntp_time -v -H localhost
sending request to peer 0
response from peer 0: offset -2.777669579e-07
sending request to peer 0
response from peer 0: offset -2.161832526e-07
sending request to peer 0
response from peer 0: offset -4.009343684e-07
sending request to peer 0
response from peer 0: offset -1.987209544e-07
discarding peer 0: stratum=0
overall average offset: 0
NTP CRITICAL: Offset unknown|
In my searches, I noticed a number of people reporting the same issue with
the supposed solution being to update your Nagios plugins to 1.4.13. I
have done so and am now running 1.4.16 without any change in the service
check.
Also, I am unable to check a remote NTP server from these clients as they
do not have access to the outside world.
It has been suggested that the stratum=0 may be the culprit, but I'm not
sure of my options here.
Any help would be greatly appreciated.
Jan
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 9
Date: Tue, 18 Jun 2013 17:24:50 +0200
From: Holger Wei? <holger at cis.fu-berlin.de>
Subject: Re: [Nagios-users] check_ntp_time offset unknown
To: Nagios Users <nagios-users at lists.sourceforge.net>
Message-ID: <20130618152450.GA678632 at zedat.fu-berlin.de>
Content-Type: text/plain; charset=iso-8859-1
* Bennett, Jan <JBennett at ntta.org> [2013-06-14 14:10]:
> # ./check_ntp_time -H localhost
> NTP CRITICAL: Offset unknown|
Could you please run "ntpq -c rv" when this happens and post the output?
> It has been suggested that the stratum=0 may be the culprit, but I'm not
sure of my options here.
Yes, stratum=0 is the culprit. An NTP server wouldn't usually report
such a stratum value.
Holger
--
Holger Wei? | Freie Universit?t Berlin
holger at zedat.fu-berlin.de | Zentraleinrichtung f?r Datenverarbeitung
(ZEDAT)
Telefon: +49 30 838-55949 | Fabeckstra?e 32, 14195 Berlin (Germany)
Telefax: +49 30 838455949 | https://www.zedat.fu-berlin.de/
------------------------------
Message: 10
Date: Tue, 18 Jun 2013 16:35:03 +0100
From: Giles Coochey <giles at coochey.net>
Subject: Re: [Nagios-users] check_ntp_time offset unknown
To: nagios-users at lists.sourceforge.net
Message-ID: <51C07E27.7000400 at coochey.net>
Content-Type: text/plain; charset="iso-8859-1"
On 14/06/2013 15:10, Bennett, Jan wrote:
>
> We have implemented a NTP sync check in all of the NRDS checks that we
> are rolling out right now but I've run into a bit of a snag.
>
> I am getting returns of 'Offset Unknown' on all clients. It appears
> to only happen for a short period of time (30 min or so) and then it
> will clear its self up for a bit but the issue will always return.
>
> From the client that is reporting the unknown offset, I can run the
> following:
>
> # ./check_ntp_time -H localhost
> NTP CRITICAL: Offset unknown|
> # ./check_ntp_time -V
> check_ntp_time v1.4.16 (nagios-plugins 1.4.16)
> # ntpdc -p
> remote local st poll reach delay offset disp
> =======================================================================
> =LOCAL(0) 127.0.0.1 10 64 17 0.00000 0.000000 0.96858
> *timeserver1 xxx.xxx.xxx.xxx 2 64 17 0.00098 4.956048 0.00580
>
> # /usr/local/nagios/libexec/check_ntp_time -v -H localhost
> sending request to peer 0
> response from peer 0: offset -2.777669579e-07
> sending request to peer 0
> response from peer 0: offset -2.161832526e-07
> sending request to peer 0
> response from peer 0: offset -4.009343684e-07
> sending request to peer 0
> response from peer 0: offset -1.987209544e-07
> discarding peer 0: stratum=0
> overall average offset: 0
> NTP CRITICAL: Offset unknown|
>
> In my searches, I noticed a number of people reporting the same issue
> with the supposed solution being to update your Nagios plugins to
> 1.4.13. I have done so and am now running 1.4.16 without any change
> in the service check.
>
> Also, I am unable to check a remote NTP server from these clients as
> they do not have access to the outside world.
>
> It has been suggested that the stratum=0 may be the culprit, but I'm
> not sure of my options here.
>
> Any help would be greatly appreciated.
>
>
I get this shortly after a NTP client has booted up. Once NTP has been
running for a while it goes away.
--
Regards,
Giles Coochey, CCNP, CCNA, CCNAS
NetSecSpec Ltd
+44 (0) 7983 877438
http://www.coochey.net
http://www.netsecspec.co.uk
giles at coochey.net
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4968 bytes
Desc: S/MIME Cryptographic Signature
------------------------------
Message: 11
Date: Tue, 18 Jun 2013 11:03:32 -0500
From: Nic Bernstein <nic at onlight.com>
Subject: [Nagios-users] Problem with check_openmanage plugin and
storage
To: nagios-users at lists.sourceforge.net
Message-ID: <51C084D4.8020104 at onlight.com>
Content-Type: text/plain; charset="utf-8"
We've recently been experimenting with Trond Hasle Amundsen's
check_openmanage on a large network with about a hundred Dell servers of
various ages, capabilities, etc. Mostly PE-2950, R210, R410 and R720.
Much thanks to Trond for all his great work on Nagios plugins and other
projects, by the way.
We've hit a wall, however, with the storage monitoring aspects of this
plugin.
For example, here's a quite specific case. This is a new PE R720, in
debug:
onlight at monitor:~$ check_openmanage -H host -C secret -d
System: PowerEdge R720 OMSA version: 7.1.0
ServiceTag: ####### Plugin version: 3.7.9
BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c
UDP/IPv4
-----------------------------------------------------------------------------
Storage Components
=============================================================================
STATE | ID | MESSAGE TEXT
---------+----------+--------------------------------------------------------
OK | 0 | Controller 0 [PERC H310 Mini] is Ready
WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164,
2.0TB] on ctrl 0 is Online, Not Certified
WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164,
2.0TB] on ctrl 0 is Online, Not Certified
OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is
Ready
OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready
OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready
OK | 0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is
Ready
-----------------------------------------------------------------------------
Chassis Components
=============================================================================
STATE | ID | MESSAGE TEXT
---------+------+------------------------------------------------------------
OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok
OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok
OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok
OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok
OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1200
RPM
OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080
RPM
OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200
RPM
OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080
RPM
OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080
RPM
OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080
RPM
OK | 0 | Power Supply 0 [AC]: Presence detected
OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads
26 C (min=3/-7, max=42/47)
OK | 1 | Temperature Probe 1 [System Board Exhaust Temp]
reads 33 C (min=8/3, max=70/75)
OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3,
max=83/88)
OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is
Present
OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good
OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good
OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good
OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good
OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good
OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good
OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good
OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good
OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good
OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good
OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good
OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good
OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good
OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good
OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good
OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good
OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good
OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V
OK | 0 | Battery probe 0 [System Board CMOS Battery] is
Presence Detected
OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A
OK | 1 | Amperage probe 1 [System Board Pwr Consumption]
reads 56 W
OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached)
OK | 0 | SD Card 0 [vFlash] is Absent
-----------------------------------------------------------------------------
Other messages
=============================================================================
STATE | MESSAGE TEXT
---------+-------------------------------------------------------------------
OK | ESM log health is Ok (less than 80% full)
OK | Chassis Service Tag is sane
This run exits with 1 (WARNING).
We're not sure we agree with the decision to make the fact that a disk
is not Dell Certified a Warning, but we can at least understand that.
So, what if we exclude storage, with --no-storage?
onlight at monitor:~$ check_openmanage -H host -C secret -d --no-storage
System: PowerEdge R720 OMSA version: 7.1.0
ServiceTag: ####### Plugin version: 3.7.9
BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c
UDP/IPv4
-----------------------------------------------------------------------------
Chassis Components
=============================================================================
STATE | ID | MESSAGE TEXT
---------+------+------------------------------------------------------------
OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok
OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok
OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok
OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok
OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1080
RPM
OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080
RPM
OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200
RPM
OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080
RPM
OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080
RPM
OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080
RPM
OK | 0 | Power Supply 0 [AC]: Presence detected
OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads
26 C (min=3/-7, max=42/47)
OK | 1 | Temperature Probe 1 [System Board Exhaust Temp]
reads 33 C (min=8/3, max=70/75)
OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3,
max=83/88)
OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is
Present
OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good
OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good
OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good
OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good
OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good
OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good
OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good
OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good
OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good
OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good
OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good
OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good
OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good
OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good
OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good
OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good
OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good
OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 112 V
OK | 0 | Battery probe 0 [System Board CMOS Battery] is
Presence Detected
OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A
OK | 1 | Amperage probe 1 [System Board Pwr Consumption]
reads 56 W
OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached)
OK | 0 | SD Card 0 [vFlash] is Absent
-----------------------------------------------------------------------------
Other messages
=============================================================================
STATE | MESSAGE TEXT
---------+-------------------------------------------------------------------
OK | ESM log health is Ok (less than 80% full)
OK | Chassis Service Tag is sane
OOPS! Something is wrong with this server, but I don't know what. The
global
system health status is WARNING, but every component check is OK. This
may
be a bug in the Nagios plugin, please file a bug report.
This yields exit code 3 (UNKNOWN).
Now, just for argument's sake, let's say we obviate the check for
certified drives, by commenting out the "workaround for OMSA 7.1.0
bug" code (just a handy little short-cut). Here's what we get then:
onlight at monitor:~$ check_openmanage -H host -C secret -d
System: PowerEdge R720 OMSA version: 7.1.0
ServiceTag: ####### Plugin version: 3.7.9
BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c
UDP/IPv4
-----------------------------------------------------------------------------
Storage Components
=============================================================================
STATE | ID | MESSAGE TEXT
---------+----------+--------------------------------------------------------
OK | 0 | Controller 0 [PERC H310 Mini] is Ready
WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164,
2.0TB] on ctrl 0 is Online
WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164,
2.0TB] on ctrl 0 is Online
OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is
Ready
OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready
OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready
OK | 0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is
Ready
-----------------------------------------------------------------------------
Chassis Components
=============================================================================
STATE | ID | MESSAGE TEXT
---------+------+------------------------------------------------------------
OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok
OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok
OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok
OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok
OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1080
RPM
OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1200
RPM
OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200
RPM
OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080
RPM
OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080
RPM
OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1200
RPM
OK | 0 | Power Supply 0 [AC]: Presence detected
OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads
26 C (min=3/-7, max=42/47)
OK | 1 | Temperature Probe 1 [System Board Exhaust Temp]
reads 33 C (min=8/3, max=70/75)
OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 48 C (min=8/3,
max=83/88)
OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is
Present
OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good
OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good
OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good
OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good
OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good
OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good
OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good
OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good
OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good
OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good
OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good
OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good
OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good
OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good
OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good
OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good
OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good
OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V
OK | 0 | Battery probe 0 [System Board CMOS Battery] is
Presence Detected
OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A
OK | 1 | Amperage probe 1 [System Board Pwr Consumption]
reads 56 W
OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached)
OK | 0 | SD Card 0 [vFlash] is Absent
-----------------------------------------------------------------------------
Other messages
=============================================================================
STATE | MESSAGE TEXT
---------+-------------------------------------------------------------------
OK | ESM log health is Ok (less than 80% full)
OK | Chassis Service Tag is sane
Again, as with the original case, exit code is 1 (WARNING).
Is there any way around this? Should I be disabling global health
checks? Here's a run to test that, and it works:
onlight at monitor:~$ check_openmanage -H host -C secret -b pdisk=all
OK - System: 'PowerEdge R720', SN: '#######', 16 GB ram (4 dimms), 1
logical drives, 2 physical drives
Interestingly, when combining the blacklist with debug ("-d -b
pdisk=all"), the exit code is 3 (UNKNOWN), but with debug off, it's 0
(OK).
So, I guess what I'm wondering is why we need to blacklist the physical
disks (pdisk) instead of using --no-storage? Shouldn't --no-storage
also cause globalstatus to be ignored?
I can furnish SNMP walk output if that's useful.
Cheers,
-nic
--
Nic Bernstein nic at onlight.com
Onlight, Inc. www.onlight.com
219 N. Milwaukee St., Suite 2a v. 414.272.4477
Milwaukee, Wisconsin 53202
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
------------------------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
End of Nagios-users Digest, Vol 85, Issue 6
*******************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20130618/c54a926a/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list