Trouble with passive checks and freshness
Christopher Odenbach
odenbach at uni-paderborn.de
Fri Sep 1 13:48:17 CEST 2006
Hi,
we have the following setup:
Nagios 2.5
73 hosts
335 active checks
765 passive checks
Each host submits its passive check results every 5 minutes into the
nagios command file. The freshness threshold is set to 2000 seconds,
so the service stays passive, when everything runs as it should.
The check_command is defined as check_dummy, which generates the
message "No data from host" when executed actively.
This works fine for nearly every host. But there is one host, which
is not different from the others, that makes trouble. The data is
coming in every 5 minutes, but Nagios keeps flipping between active
and passive mode:
root at giedi3[nagios]# tail -10000 nagios.log | grep rana | grep disk | naglog.pl | cut -c-100
[01.09. 12:29:04] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 12:31:49] SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
[01.09. 12:34:24] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 12:34:29] SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
[01.09. 12:39:44] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 12:41:50] SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
[01.09. 12:45:04] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 12:45:10] SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
[01.09. 12:50:29] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 12:51:49] SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
[01.09. 12:55:49] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 12:55:59] SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
[01.09. 13:01:09] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 13:01:49] SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
[01.09. 13:06:29] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 13:06:39] SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
[01.09. 13:11:50] SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
[01.09. 13:11:50] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 13:11:50] SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
[01.09. 13:17:09] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 13:21:49] SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
[01.09. 13:22:30] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 13:22:39] SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
[01.09. 13:27:54] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 13:31:49] SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
[01.09. 13:33:14] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
[01.09. 13:33:19] SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
All other hosts work just fine:
root at giedi3[nagios]# tail -10000 nagios.log | grep etamin | grep disk | naglog.pl | cut -c-100
[01.09. 12:30:49] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
[01.09. 12:36:09] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
[01.09. 12:41:29] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
[01.09. 12:46:50] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
[01.09. 12:52:09] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
[01.09. 12:57:29] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
[01.09. 13:02:49] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
[01.09. 13:08:09] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
[01.09. 13:13:34] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
[01.09. 13:18:54] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
[01.09. 13:24:14] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
[01.09. 13:29:34] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
[01.09. 13:34:54] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
The configuration is:
define service {
use check_local_disk_templ
host_name rana
}
define service{
use passive_templ
name check_local_disk_templ
service_description Local disk
servicegroups local-disk-services
check_command check_disk!-w 15% -c 10% -x /afs -e
notifications_enabled 1
register 0
}
define service{
name passive_templ
register 0
max_check_attempts 1
normal_check_interval 10
retry_check_interval 1
active_checks_enabled 0
passive_checks_enabled 1
check_freshness 1
freshness_threshold 2000
check_period always
notification_interval 0
notification_period always
notification_options w,c,r,f
notifications_enabled 0
contact_groups Server
process_perf_data 1
}
define command{
command_name check_disk
command_line $USER1$/check_dummy 3 "No data from host - nsce not running?"
}
Any ideas what is going wrong here? Why is Nagios flipping the
service to active when data has arrived less than 2000 seconds ago?
Thanks,
Christopher
--
======================================================
Dipl.-Ing. Christopher Odenbach
Zentrum fuer Informations- und Medientechnologien
Universitaet Paderborn
Raum N5.110
odenbach at uni-paderborn.de
Tel.: +49 5251 60 5315
======================================================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20060901/aacaa0ff/attachment.sig>
-------------- next part --------------
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list