Nagios sometimes shows wrong status
Michael Prochaska
michael at prochas.net
Wed May 27 10:52:32 CEST 2009
Hi!
I've seen a strange behavior of nagios with a very simple check script.
the relevant part of the script:
#########################################################################
MAINTCNT="`/usr/sbin/metastat |grep -i maint |wc -l`"
RESYNCNT="`/usr/sbin/metastat |grep -i resync |wc -l`"
NOTOK=0
status=$STATE_UNKNOWN
if [ $RESYNCNT -gt 0 ]; then
NOTOK=1
TEXT="WARNING - One or more disks are in resync state. "
status=$STATE_WARNING
fi
if [ $MAINTCNT -gt 0 ]; then
NOTOK=1
TEXT="CRITICAL - One or more disks are in maintenance state."
status=$STATE_CRITICAL
fi
if [ $NOTOK -eq 1 ]; then
echo $TEXT
datum=`date`
echo $datum $status >> /tmp/svm.debug
exit $status
fi
echo "OK - There is no maintenance necessary!"
exit $STATE_OK
#########################################################################
when executing the script from command line, the return code always is 2
and the output always is "CRITICAL - One or more disks are in maintenance
state." (because there is one dead disk) => thats ok
when nagios executes the script, the output always is "CRITICAL - One or
more disks are in maintenance state." but the return code sometimes is 0
and sometimes is 2 => thats not good
snippet from nagios.log:
[1243410051] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.
[1243410063] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243410061
[1243410071] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
more disks are in maintenance state.
[1243410083] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243410081
[1243410091] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.
[1243410124] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243410122
[1243410131] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
more disks are in maintenance state.
[1243411031] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.
[1243411316] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
more disks are in maintenance state.
[1243411323] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411320
[1243411326] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.
[1243411363] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411361
[1243411366] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
more disks are in maintenance state.
[1243411370] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411368
[1243411376] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.
[1243411391] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411389
[1243411396] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;2;CRITICAL -
One or more disks are in maintenance state.
[1243411398] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411396
[1243411406] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;3;CRITICAL -
One or more disks are in maintenance state.
[1243411407] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411405
/tmp/svm.debug confirmes the command line result:
> cat /tmp/svm.debug
Wed May 27 08:21:33 GMT 2009 2
Wed May 27 08:22:28 GMT 2009 2
Wed May 27 08:22:39 GMT 2009 2
Wed May 27 08:22:46 GMT 2009 2
Wed May 27 08:23:00 GMT 2009 2
Wed May 27 08:23:11 GMT 2009 2
Wed May 27 08:23:46 GMT 2009 2
Wed May 27 08:24:01 GMT 2009 2
Wed May 27 08:27:09 GMT 2009 2
Wed May 27 08:27:19 GMT 2009 2
Wed May 27 08:27:35 GMT 2009 2
Wed May 27 08:27:50 GMT 2009 2
Wed May 27 08:27:56 GMT 2009 2
Wed May 27 08:29:01 GMT 2009 2
Wed May 27 08:32:55 GMT 2009 2
Wed May 27 08:34:01 GMT 2009 2
Wed May 27 08:37:55 GMT 2009 2
Wed May 27 08:39:01 GMT 2009 2
Wed May 27 08:39:55 GMT 2009 2
Wed May 27 08:44:01 GMT 2009 2
Wed May 27 08:44:55 GMT 2009 2
and so on.....
any ideas whats going here wrong?
best regards,
michael
------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, &
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com
More information about the Developers
mailing list