Trouble with passive checks and freshness
Christopher Odenbach
odenbach at uni-paderborn.de
Mon Sep 4 14:24:28 CEST 2006
Hi,
> > This works fine for nearly every host. But there is one host, which
> > is not different from the others, that makes trouble. The data is
> > coming in every 5 minutes, but Nagios keeps flipping between active
> > and passive mode:
>
> Perhaps some individual configuration that crept into your system?
> I'd recommend to check the objects.cache file and see if this host is
> actually set up identical to the others.
>
> Hope this helps,
I just checked the objects.cache file. The host and service entries for
rana and another host are completely identical:
define host {
host_name rana
check_command check-host-alive
contact_groups No-Alarm
notification_period always
check_interval 0
max_check_attempts 3
active_checks_enabled 1
passive_checks_enabled 1
obsess_over_host 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
freshness_threshold 0
check_freshness 0
notification_options d,u,r
notifications_enabled 1
notification_interval 0
stalking_options n
process_perf_data 1
failure_prediction_enabled 1
retain_status_information 1
retain_nonstatus_information 1
}
define service {
host_name rana
service_description Local disk
check_period always
check_command check_disk!-w 15% -c 10% -x /afs -e
contact_groups Server
notification_period always
normal_check_interval 10
retry_check_interval 1
max_check_attempts 1
is_volatile 0
parallelize_check 1
active_checks_enabled 0
passive_checks_enabled 1
obsess_over_service 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
freshness_threshold 2000
check_freshness 1
notification_options w,c,r,f
notifications_enabled 1
notification_interval 0
stalking_options n
process_perf_data 1
failure_prediction_enabled 1
retain_status_information 1
retain_nonstatus_information 1
}
But still all passive checks are flapping. Let me show you the log file:
root at giedi3[nagios]# tail -2000 nagios.log | grep rana | naglog.pl
[...] (naglog.pl just formats the timestamp readable)
Here the passive check results come in - everything ok:
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;4upgrades;1;PKG WARNING - Upgrade:
base-config, libc6, libc6-sparc64, libgnutls11, libsasl2,
libsasl2-modules, login, passwd, perl, perl-base, perl-doc,
perl-modules, perl-suid
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free
space:| /=282MB;1297;1374;0;1527 /dev/shm=0MB;427;452;0;503 /var=118MB;205;217;0;242 /boot=10MB;39;42;0;47 /var/log=83MB;630;667;0;742 /var/cache/openafs=32MB;420;445;0;495 /tmp=0MB;398;422;0;469
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Local swap;0;SWAP OK - 100% free (512
MB out of 512 MB) |swap=511MB;102;51;0;511
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: bosserver;0;PROCS OK: 1 process
with command name 'bosserver'
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cfexecd;0;PROCS OK: 2 processes
with args '/usr/sbin/cfexecd'
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cron;0;PROCS OK: 1 process with
args '/usr/sbin/cron'
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: klogd;0;PROCS OK: 1 process
with args '/sbin/klogd'
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ntpd;0;PROCS OK: 1 process with
args '/usr/sbin/ntpd'
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: nullmailer-send;0;PROCS OK: 1
process with args '/usr/sbin/nullmailer-send'
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ptserver;0;PROCS OK: 1 process
with command name 'ptserver'
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: syslogd;0;PROCS OK: 1 process
with args '/sbin/syslogd'
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: vlserver;0;PROCS OK: 1 process
with command name 'vlserver'
[04.09. 13:17:16] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;System load;0;OK - load average:
0.33, 0.15, 0.10|load1=0.330;3.000;5.000;0;
load5=0.150;9999.000;9999.000;0; load15=0.100;9999.000;9999.000;0;
Five seconds later Nagios updates the service states:
[04.09. 13:17:21] SERVICE ALERT: rana;4upgrades;WARNING;HARD;1;PKG
WARNING - Upgrade: base-config, libc6, libc6-sparc64, libgnutls11,
libsasl2, libsasl2-modules, login, passwd, perl, perl-base, perl-doc,
perl-modules, perl-suid
[04.09. 13:17:21] SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK -
free space:
[04.09. 13:17:21] SERVICE ALERT: rana;Proc: cron;OK;HARD;1;PROCS OK: 1
process with args '/usr/sbin/cron'
[04.09. 13:17:21] SERVICE ALERT: rana;Proc: klogd;OK;HARD;1;PROCS OK: 1
process with args '/sbin/klogd'
[04.09. 13:17:21] SERVICE ALERT: rana;Proc: syslogd;OK;HARD;1;PROCS OK:
1 process with args '/sbin/syslogd'
[04.09. 13:17:21] SERVICE ALERT: rana;Proc: vlserver;OK;HARD;1;PROCS
OK: 1 process with command name 'vlserver'
20 seconds later some services fall down to unknown state (which is done
by switching them to active). This should not happen because there was
correct data a few lines above!
[04.09. 13:17:41] SERVICE ALERT: rana;Local
swap;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:18:12] SERVICE ALERT: rana;Proc:
ntpd;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:20:31] SERVICE ALERT: rana;Proc:
nullmailer-send;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not
running?
[04.09. 13:20:41] SERVICE ALERT: rana;System
load;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:20:51] SERVICE ALERT: rana;Proc:
bosserver;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:21:32] SERVICE ALERT: rana;Proc:
ptserver;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:21:32] SERVICE ALERT: rana;Proc:
cfexecd;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
After five minutes the same thing. Fresh data comes in:
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;4upgrades;1;PKG WARNING - Upgrade:
base-config, libc6, libc6-sparc64, libgnutls11, libsasl2,
libsasl2-modules, login, passwd, perl, perl-base, perl-doc,
perl-modules, perl-suid
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free
space:| /=282MB;1297;1374;0;1527 /dev/shm=0MB;427;452;0;503 /var=118MB;205;217;0;242 /boot=10MB;39;42;0;47 /var/log=83MB;630;667;0;742 /var/cache/openafs=32MB;420;445;0;495 /tmp=0MB;398;422;0;469
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Local swap;0;SWAP OK - 100% free (512
MB out of 512 MB) |swap=511MB;102;51;0;511
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: bosserver;0;PROCS OK: 1 process
with command name 'bosserver'
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cfexecd;0;PROCS OK: 2 processes
with args '/usr/sbin/cfexecd'
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cron;0;PROCS OK: 1 process with
args '/usr/sbin/cron'
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: klogd;0;PROCS OK: 1 process
with args '/sbin/klogd'
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ntpd;0;PROCS OK: 1 process with
args '/usr/sbin/ntpd'
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: nullmailer-send;0;PROCS OK: 1
process with args '/usr/sbin/nullmailer-send'
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ptserver;0;PROCS OK: 1 process
with command name 'ptserver'
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: syslogd;0;PROCS OK: 1 process
with args '/sbin/syslogd'
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: vlserver;0;PROCS OK: 1 process
with command name 'vlserver'
[04.09. 13:22:41] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;rana;System load;0;OK - load average:
0.24, 0.12, 0.10|load1=0.240;3.000;5.000;0;
load5=0.120;9999.000;9999.000;0; load15=0.100;9999.000;9999.000;0;
The services toggle to OK again:
[04.09. 13:22:51] SERVICE ALERT: rana;Local swap;OK;HARD;1;SWAP OK -
100% free (512 MB out of 512 MB)
[04.09. 13:22:51] SERVICE ALERT: rana;Proc: bosserver;OK;HARD;1;PROCS
OK: 1 process with command name 'bosserver'
[04.09. 13:22:51] SERVICE ALERT: rana;Proc: cfexecd;OK;HARD;1;PROCS OK:
2 processes with args '/usr/sbin/cfexecd'
[04.09. 13:22:51] SERVICE ALERT: rana;Proc: ntpd;OK;HARD;1;PROCS OK: 1
process with args '/usr/sbin/ntpd'
[04.09. 13:22:51] SERVICE ALERT: rana;Proc:
nullmailer-send;OK;HARD;1;PROCS OK: 1 process with args
'/usr/sbin/nullmailer-send'
[04.09. 13:22:51] SERVICE ALERT: rana;Proc: ptserver;OK;HARD;1;PROCS
OK: 1 process with command name 'ptserver'
[04.09. 13:22:51] SERVICE ALERT: rana;System load;OK;HARD;1;OK - load
average: 0.24, 0.12, 0.10
Then the services which should still be ok fall to unknown state:
[04.09. 13:24:31] SERVICE ALERT: rana;Proc:
cron;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:24:41] SERVICE ALERT: rana;Proc:
syslogd;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:24:41] SERVICE ALERT: rana;4upgrades;UNKNOWN;HARD;1;UNKNOWN:
No data from host - nsce not running?
[04.09. 13:25:41] SERVICE ALERT: rana;Proc:
vlserver;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:25:41] SERVICE ALERT: rana;Local
disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
root at giedi3[nagios]#
What is going on here?
Thanks,
Christopher
--
======================================================
Dipl.-Ing. Christopher Odenbach
Zentrum fuer Informations- und Medientechnologien
Universitaet Paderborn
Raum N5.110
odenbach at uni-paderborn.de
Tel.: +49 5251 60 5315
======================================================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20060904/389317c2/attachment.sig>
-------------- next part --------------
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list