NSCA and Nagios 2.0b3 - freshnes checks
Christian Kleinfeld
c_kleinfeld at hotmail.com
Tue May 17 15:05:18 CEST 2005
Hello guys
sorry - i posted this before in nagios-users, but i think nagios-devel is a
better place for this issue.
I'm using Nagios 2.0b3 with NSCA from the CVS-Tree.
My Systems send every 5 minutes a HeartBeat (passive HostCheck) to our
Nagios Server. Services Reports for Services in Critical/Warning state
will send every 5 minutes and every hour a complete Report for all
configured Services to Nagios via send_nsca and nsca Daemon. (2 connections
per host, first passive host check and the second connection sends all
service results, if send_nsca timeouts it try's it again 3 times).
NSCA runs at our Nagios Server in Daemon mode.
Sometimes Nagios forces passive checked hosts and services to stale
state without any reason. I say 'with no reason' because it received the
passive checks but nagios core thinks it didnt.
I explain it more in detail in a example Host.
Here is a snipped from my logfile :
>[Tue May 17 07:00:01 2005] EXTERNAL COMMAND:
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:00:03 2005] EXTERNAL COMMAND:
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;squid;0;OK - process squid is running
>: PID= 2679 2681 ;
>[Tue May 17 07:00:05 2005] EXTERNAL COMMAND:
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;atd;0;OK - process atd is running :
>PID= 27123 ;
>[Tue May 17 07:00:05 2005] EXTERNAL COMMAND:
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;mgetty;0;OK - process mgetty is
>running : PID= 753 ;
>[Tue May 17 07:00:05 2005] EXTERNAL COMMAND:
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;mingetty;0;OK - process mingetty is
>running : PID= 748 749 750 751 752 813 ;
>[Tue May 17 07:00:06 2005] EXTERNAL COMMAND:
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;syslogd;0;OK - process syslogd is
>running : PID= 506 ;
ok, so we see nsca has written it to nagios core logic.
>[Tue May 17 07:05:00 2005] EXTERNAL COMMAND:
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:10:01 2005] EXTERNAL COMMAND:
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:15:00 2005] EXTERNAL COMMAND:
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:20:00 2005] EXTERNAL COMMAND:
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
Heartbeat received, everything is ok at this moment.
>[Tue May 17 07:23:32 2005] SERVICE ALERT:
>cmseprx6;atd;WARNING;HARD;1;WARNING: No Report received
>[Tue May 17 07:23:32 2005] SERVICE ALERT:
>cmseprx6;mgetty;WARNING;HARD;1;WARNING: No Report received
>[Tue May 17 07:25:24 2005] SERVICE ALERT:
>cmseprx6;mingetty;WARNING;HARD;1;WARNING: No Report received
>[Tue May 17 07:25:24 2005] SERVICE ALERT:
>cmseprx6;squid;WARNING;HARD;1;WARNING: No Report
>received
>[Tue May 17 07:25:24 2005] SERVICE ALERT:
>cmseprx6;syslogd;WARNING;HARD;1;WARNING: No Report received
And this is the strange thing.
I have a service freshness of 90 minutes and the services has alerted
after 23-25mins after receiving the last check, why is it executed at
this point?
I dont know what i can do anymore to solve this problem
Anyone have a idea what's going wrong at this point?
This happens by 7 of 290 Hosts.
-- Here my config related parts :
# nagios.cfg
command_check_interval=-1
check_service_freshness=1
service_freshness_check_interval=60 # nagios default, but doesnt care - we
use other freshnes values at our templates
check_host_freshness=1
host_freshness_check_interval=420
accept_passive_service_checks=1
accept_passive_host_checks=1
# hosts.cfg
define host {
use generic-host-passive
host_name cmseprx6
alias cmseprx6
address 10.248.0.23
contact_groups scpcms-admins,operations
}
# services.cfg
define service {
use generic-passive
host_name cmseprx6
service_description atd
contact_groups scpcms-admins,operations
notification_period 24x7
notification_options w,u,c,r
}
define service {
use generic-passive
host_name cmseprx6
service_description squid
contact_groups scpcms-admins,operations
notification_period 24x7
notification_options w,u,c,r
}
# template.cfg
define service {
name generic-passive
active_checks_enabled 0
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 1
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
register 0
max_check_attempts 1
normal_check_interval 90
retry_check_interval 1
notification_interval 1440
freshness_threshold 5400
check_period 24x7
check_command check_dummy!1!"No Report received"
}
define host {
name generic-host-passive
notifications_enabled 1
event_handler_enabled 0
flap_detection_enabled 1
process_perf_data 0
retain_status_information 1
retain_nonstatus_information 1
active_checks_enabled 0
check_freshness 1
freshness_threshold 420
check_period 24x7
check_command check_dummy!2!"No Report, host maybe down"
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
register 0
}
_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar get it now!
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
More information about the Developers
mailing list