Trouble with passive checks and freshness
Arno Lehmann
al at its-lehmann.de
Mon Sep 4 21:57:12 CEST 2006
Hi,
On 9/4/2006 2:24 PM, Christopher Odenbach wrote:
> Hi,
>
>
>>>This works fine for nearly every host. But there is one host, which
>>>is not different from the others, that makes trouble. The data is
>>>coming in every 5 minutes, but Nagios keeps flipping between active
>>>and passive mode:
>>
>>Perhaps some individual configuration that crept into your system?
>>I'd recommend to check the objects.cache file and see if this host is
>>actually set up identical to the others.
>>
>>Hope this helps,
>
>
> I just checked the objects.cache file. The host and service entries for
> rana and another host are completely identical:
>
> define host {
> host_name rana
> check_command check-host-alive
> contact_groups No-Alarm
> notification_period always
> check_interval 0
> max_check_attempts 3
> active_checks_enabled 1
Try disabling active checks in the configuration
> passive_checks_enabled 1
> obsess_over_host 1
> event_handler_enabled 1
> low_flap_threshold 0.000000
> high_flap_threshold 0.000000
> flap_detection_enabled 1
> freshness_threshold 0
> check_freshness 0
> notification_options d,u,r
> notifications_enabled 1
> notification_interval 0
> stalking_options n
> process_perf_data 1
> failure_prediction_enabled 1
> retain_status_information 1
> retain_nonstatus_information 1
> }
>
> define service {
> host_name rana
> service_description Local disk
> check_period always
> check_command check_disk!-w 15% -c 10% -x /afs -e
> contact_groups Server
> notification_period always
> normal_check_interval 10
> retry_check_interval 1
> max_check_attempts 1
> is_volatile 0
> parallelize_check 1
> active_checks_enabled 0
Or rather, in the web front-end... I guess you overlooked this
difference :-)
Arno
> passive_checks_enabled 1
> obsess_over_service 1
> event_handler_enabled 1
> low_flap_threshold 0.000000
> high_flap_threshold 0.000000
> flap_detection_enabled 1
> freshness_threshold 2000
> check_freshness 1
> notification_options w,c,r,f
> notifications_enabled 1
> notification_interval 0
> stalking_options n
> process_perf_data 1
> failure_prediction_enabled 1
> retain_status_information 1
> retain_nonstatus_information 1
> }
>
> But still all passive checks are flapping. Let me show you the log file:
>
> root at giedi3[nagios]# tail -2000 nagios.log | grep rana | naglog.pl
> [...] (naglog.pl just formats the timestamp readable)
>
> Here the passive check results come in - everything ok:
>
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;4upgrades;1;PKG WARNING - Upgrade:
> base-config, libc6, libc6-sparc64, libgnutls11, libsasl2,
> libsasl2-modules, login, passwd, perl, perl-base, perl-doc,
> perl-modules, perl-suid
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free
> space:| /=282MB;1297;1374;0;1527 /dev/shm=0MB;427;452;0;503 /var=118MB;205;217;0;242 /boot=10MB;39;42;0;47 /var/log=83MB;630;667;0;742 /var/cache/openafs=32MB;420;445;0;495 /tmp=0MB;398;422;0;469
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Local swap;0;SWAP OK - 100% free (512
> MB out of 512 MB) |swap=511MB;102;51;0;511
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: bosserver;0;PROCS OK: 1 process
> with command name 'bosserver'
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cfexecd;0;PROCS OK: 2 processes
> with args '/usr/sbin/cfexecd'
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cron;0;PROCS OK: 1 process with
> args '/usr/sbin/cron'
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: klogd;0;PROCS OK: 1 process
> with args '/sbin/klogd'
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ntpd;0;PROCS OK: 1 process with
> args '/usr/sbin/ntpd'
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: nullmailer-send;0;PROCS OK: 1
> process with args '/usr/sbin/nullmailer-send'
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ptserver;0;PROCS OK: 1 process
> with command name 'ptserver'
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: syslogd;0;PROCS OK: 1 process
> with args '/sbin/syslogd'
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: vlserver;0;PROCS OK: 1 process
> with command name 'vlserver'
> [04.09. 13:17:16] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;System load;0;OK - load average:
> 0.33, 0.15, 0.10|load1=0.330;3.000;5.000;0;
> load5=0.150;9999.000;9999.000;0; load15=0.100;9999.000;9999.000;0;
>
> Five seconds later Nagios updates the service states:
>
> [04.09. 13:17:21] SERVICE ALERT: rana;4upgrades;WARNING;HARD;1;PKG
> WARNING - Upgrade: base-config, libc6, libc6-sparc64, libgnutls11,
> libsasl2, libsasl2-modules, login, passwd, perl, perl-base, perl-doc,
> perl-modules, perl-suid
> [04.09. 13:17:21] SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK -
> free space:
> [04.09. 13:17:21] SERVICE ALERT: rana;Proc: cron;OK;HARD;1;PROCS OK: 1
> process with args '/usr/sbin/cron'
> [04.09. 13:17:21] SERVICE ALERT: rana;Proc: klogd;OK;HARD;1;PROCS OK: 1
> process with args '/sbin/klogd'
> [04.09. 13:17:21] SERVICE ALERT: rana;Proc: syslogd;OK;HARD;1;PROCS OK:
> 1 process with args '/sbin/syslogd'
> [04.09. 13:17:21] SERVICE ALERT: rana;Proc: vlserver;OK;HARD;1;PROCS
> OK: 1 process with command name 'vlserver'
>
> 20 seconds later some services fall down to unknown state (which is done
> by switching them to active). This should not happen because there was
> correct data a few lines above!
>
> [04.09. 13:17:41] SERVICE ALERT: rana;Local
> swap;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:18:12] SERVICE ALERT: rana;Proc:
> ntpd;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:20:31] SERVICE ALERT: rana;Proc:
> nullmailer-send;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not
> running?
> [04.09. 13:20:41] SERVICE ALERT: rana;System
> load;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:20:51] SERVICE ALERT: rana;Proc:
> bosserver;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:21:32] SERVICE ALERT: rana;Proc:
> ptserver;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:21:32] SERVICE ALERT: rana;Proc:
> cfexecd;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
>
> After five minutes the same thing. Fresh data comes in:
>
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;4upgrades;1;PKG WARNING - Upgrade:
> base-config, libc6, libc6-sparc64, libgnutls11, libsasl2,
> libsasl2-modules, login, passwd, perl, perl-base, perl-doc,
> perl-modules, perl-suid
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free
> space:| /=282MB;1297;1374;0;1527 /dev/shm=0MB;427;452;0;503 /var=118MB;205;217;0;242 /boot=10MB;39;42;0;47 /var/log=83MB;630;667;0;742 /var/cache/openafs=32MB;420;445;0;495 /tmp=0MB;398;422;0;469
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Local swap;0;SWAP OK - 100% free (512
> MB out of 512 MB) |swap=511MB;102;51;0;511
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: bosserver;0;PROCS OK: 1 process
> with command name 'bosserver'
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cfexecd;0;PROCS OK: 2 processes
> with args '/usr/sbin/cfexecd'
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cron;0;PROCS OK: 1 process with
> args '/usr/sbin/cron'
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: klogd;0;PROCS OK: 1 process
> with args '/sbin/klogd'
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ntpd;0;PROCS OK: 1 process with
> args '/usr/sbin/ntpd'
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: nullmailer-send;0;PROCS OK: 1
> process with args '/usr/sbin/nullmailer-send'
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ptserver;0;PROCS OK: 1 process
> with command name 'ptserver'
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: syslogd;0;PROCS OK: 1 process
> with args '/sbin/syslogd'
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: vlserver;0;PROCS OK: 1 process
> with command name 'vlserver'
> [04.09. 13:22:41] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;rana;System load;0;OK - load average:
> 0.24, 0.12, 0.10|load1=0.240;3.000;5.000;0;
> load5=0.120;9999.000;9999.000;0; load15=0.100;9999.000;9999.000;0;
>
> The services toggle to OK again:
>
> [04.09. 13:22:51] SERVICE ALERT: rana;Local swap;OK;HARD;1;SWAP OK -
> 100% free (512 MB out of 512 MB)
> [04.09. 13:22:51] SERVICE ALERT: rana;Proc: bosserver;OK;HARD;1;PROCS
> OK: 1 process with command name 'bosserver'
> [04.09. 13:22:51] SERVICE ALERT: rana;Proc: cfexecd;OK;HARD;1;PROCS OK:
> 2 processes with args '/usr/sbin/cfexecd'
> [04.09. 13:22:51] SERVICE ALERT: rana;Proc: ntpd;OK;HARD;1;PROCS OK: 1
> process with args '/usr/sbin/ntpd'
> [04.09. 13:22:51] SERVICE ALERT: rana;Proc:
> nullmailer-send;OK;HARD;1;PROCS OK: 1 process with args
> '/usr/sbin/nullmailer-send'
> [04.09. 13:22:51] SERVICE ALERT: rana;Proc: ptserver;OK;HARD;1;PROCS
> OK: 1 process with command name 'ptserver'
> [04.09. 13:22:51] SERVICE ALERT: rana;System load;OK;HARD;1;OK - load
> average: 0.24, 0.12, 0.10
>
> Then the services which should still be ok fall to unknown state:
>
> [04.09. 13:24:31] SERVICE ALERT: rana;Proc:
> cron;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:24:41] SERVICE ALERT: rana;Proc:
> syslogd;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:24:41] SERVICE ALERT: rana;4upgrades;UNKNOWN;HARD;1;UNKNOWN:
> No data from host - nsce not running?
> [04.09. 13:25:41] SERVICE ALERT: rana;Proc:
> vlserver;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:25:41] SERVICE ALERT: rana;Local
> disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> root at giedi3[nagios]#
>
> What is going on here?
>
> Thanks,
>
> Christopher
>
--
IT-Service Lehmann al at its-lehmann.de
Arno Lehmann http://www.its-lehmann.de
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list