Nagios sometimes shows wrong status

Michael Prochaska michael at prochas.net
Wed May 27 16:04:24 CEST 2009


Hi,

sorry, here is the right snippet from  nagios.log:

[1243412547] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243412545
[1243412553] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;3;CRITICAL - One or
more disks are in maintenance state.
[1243412558] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243412556
[1243412565] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243412563
[1243412579] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243412577
[1243412583] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.
[1243412590] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243412588
[1243412593] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
more disks are in maintenance state.
[1243412625] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243412623
[1243412828] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243412826
[1243412838] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243412836
[1243412854] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243412851
[1243412869] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243412866
[1243412875] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243412866
...
[1243413483] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.
...
[1243413603] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
more disks are in maintenance state.
[1243413903] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.


and the time of the two systems are sync :-)

# date && ssh acgweb1 date
Wednesday, May 27, 2009  2:02:20 PM GMT
Wed May 27 14:02:20 GMT 2009


regards,
michael

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 27/05/09 04:52 AM, Michael Prochaska wrote:
>> Hi!
>>
>> I've seen a strange behavior of nagios with a very simple check script.
>>
>> the relevant part of the script:
>> #########################################################################
>> MAINTCNT="`/usr/sbin/metastat |grep -i maint |wc -l`"
>> RESYNCNT="`/usr/sbin/metastat |grep -i resync |wc -l`"
>>
>> NOTOK=0
>> status=$STATE_UNKNOWN
>>
>> if [ $RESYNCNT -gt 0 ]; then
>>         NOTOK=1
>>         TEXT="WARNING - One or more disks are in resync state. "
>>         status=$STATE_WARNING
>> fi
>>
>> if [ $MAINTCNT -gt 0 ]; then
>>         NOTOK=1
>>         TEXT="CRITICAL - One or more disks are in maintenance state."
>> status=$STATE_CRITICAL
>> fi
>>
>>
>> if [ $NOTOK -eq 1 ]; then
>>         echo $TEXT
>>         datum=`date`
>>         echo $datum $status >> /tmp/svm.debug
>>         exit $status
>> fi
>>
>> echo "OK - There is no maintenance necessary!"
>> exit $STATE_OK
>>
>> #########################################################################
>>
>> when executing the script from command line, the return code always is 2
>> and the output always is "CRITICAL - One or more disks are in
>> maintenance
>> state." (because there is one dead disk) => thats ok
>>
>> when nagios executes the script, the output always is "CRITICAL - One or
>> more disks are in maintenance state." but the return code sometimes is 0
>> and sometimes is 2 => thats not good
>>
>> snippet from nagios.log:
>> [1243410051] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
>> One or more disks are in maintenance state.
>> [1243410063] EXTERNAL COMMAND:
>> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243410061
>> [1243410071] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One
>> or
>> more disks are in maintenance state.
>> [1243410083] EXTERNAL COMMAND:
>> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243410081
>> [1243410091] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
>> One or more disks are in maintenance state.
>> [1243410124] EXTERNAL COMMAND:
>> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243410122
>> [1243410131] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One
>> or
>> more disks are in maintenance state.
>> [1243411031] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
>> One or more disks are in maintenance state.
>> [1243411316] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One
>> or
>> more disks are in maintenance state.
>> [1243411323] EXTERNAL COMMAND:
>> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411320
>> [1243411326] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
>> One or more disks are in maintenance state.
>> [1243411363] EXTERNAL COMMAND:
>> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411361
>> [1243411366] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One
>> or
>> more disks are in maintenance state.
>> [1243411370] EXTERNAL COMMAND:
>> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411368
>> [1243411376] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
>> One or more disks are in maintenance state.
>> [1243411391] EXTERNAL COMMAND:
>> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411389
>> [1243411396] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;2;CRITICAL -
>> One or more disks are in maintenance state.
>> [1243411398] EXTERNAL COMMAND:
>> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411396
>> [1243411406] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;3;CRITICAL -
>> One or more disks are in maintenance state.
>> [1243411407] EXTERNAL COMMAND:
>> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411405
>>
>>
>>
>> /tmp/svm.debug confirmes the command line result:
>>> cat /tmp/svm.debug
>> Wed May 27 08:21:33 GMT 2009 2
>> Wed May 27 08:22:28 GMT 2009 2
>> Wed May 27 08:22:39 GMT 2009 2
>> Wed May 27 08:22:46 GMT 2009 2
>> Wed May 27 08:23:00 GMT 2009 2
>> Wed May 27 08:23:11 GMT 2009 2
>> Wed May 27 08:23:46 GMT 2009 2
>> Wed May 27 08:24:01 GMT 2009 2
>> Wed May 27 08:27:09 GMT 2009 2
>> Wed May 27 08:27:19 GMT 2009 2
>> Wed May 27 08:27:35 GMT 2009 2
>> Wed May 27 08:27:50 GMT 2009 2
>> Wed May 27 08:27:56 GMT 2009 2
>> Wed May 27 08:29:01 GMT 2009 2
>> Wed May 27 08:32:55 GMT 2009 2
>> Wed May 27 08:34:01 GMT 2009 2
>> Wed May 27 08:37:55 GMT 2009 2
>> Wed May 27 08:39:01 GMT 2009 2
>> Wed May 27 08:39:55 GMT 2009 2
>> Wed May 27 08:44:01 GMT 2009 2
>> Wed May 27 08:44:55 GMT 2009 2
>
> The times in your nagios log are between Wed May 27 07:40:51 2009 and
> Wed May 27 08:03:27 2009. Could you send matching logs?
>
> - --
> Thomas
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFKHSdg6dZ+Kt5BchYRAsDmAKDhynEcZ5WwKoIU8VIxLbUm1IFaIACgmh9q
> NKYXWWjnmdR/wTG77YmD22Y=
> =mVtr
> -----END PGP SIGNATURE-----
>
>


------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 




More information about the Developers mailing list