NSCA and Nagios 2.0b3 - wrong service freshness
Christian Kleinfeld
c_kleinfeld at hotmail.com
Tue May 17 08:27:11 CEST 2005
Hello guys
I'm using Nagios 2.0b3 with NSCA from the Tree.
My Systems send every 5 minutes a HeartBeat (passive HostCheck) to our
Nagios Server. Services Reports for Services in Critical/Warning state
will send every 5 minutes and every hour a complete Report for all
configured Services to Nagios via send_nsca and nsca Daemon.
NSCA runs at our Nagios Server in Daemon mode.
Sometimes Nagios forces passive checked hosts and services to stale
state without any reason. I say 'with no reason' because it received the
passive checks but nagios core thinks it didnt.
I explain it more in detail in a example Host.
Here is a snipped from my logfile :
>[Tue May 17 07:00:01 2005] EXTERNAL COMMAND:
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:00:03 2005] EXTERNAL COMMAND:
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;squid;0;OK - process squid is running
>: PID= 2679 2681 ;
>[Tue May 17 07:00:05 2005] EXTERNAL COMMAND:
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;atd;0;OK - process atd is running :
>PID= 27123 ;
>[Tue May 17 07:00:05 2005] EXTERNAL COMMAND:
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;mgetty;0;OK - process mgetty is
>running : PID= 753 ;
>[Tue May 17 07:00:05 2005] EXTERNAL COMMAND:
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;mingetty;0;OK - process mingetty is
>running : PID= 748 749 750 751 752 813 ;
>[Tue May 17 07:00:06 2005] EXTERNAL COMMAND:
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;syslogd;0;OK - process syslogd is
>running : PID= 506 ;
ok, so we see nsca has written it to nagios core logic.
>[Tue May 17 07:05:00 2005] EXTERNAL COMMAND:
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:10:01 2005] EXTERNAL COMMAND:
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:15:00 2005] EXTERNAL COMMAND:
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:20:00 2005] EXTERNAL COMMAND:
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
Heartbeat received, everything is ok at this moment.
>[Tue May 17 07:23:32 2005] SERVICE ALERT:
>cmseprx6;atd;WARNING;HARD;1;WARNING: No Report received
>[Tue May 17 07:23:32 2005] SERVICE ALERT:
>cmseprx6;mgetty;WARNING;HARD;1;WARNING: No Report received
>[Tue May 17 07:25:24 2005] SERVICE ALERT:
>cmseprx6;mingetty;WARNING;HARD;1;WARNING: No Report received
>[Tue May 17 07:25:24 2005] SERVICE ALERT:
>cmseprx6;squid;WARNING;HARD;1;WARNING: No Report
>received
>[Tue May 17 07:25:24 2005] SERVICE ALERT:
>cmseprx6;syslogd;WARNING;HARD;1;WARNING: No Report received
And this is the strange thing.
I have a service freshness of 90 minutes and the services has alerted
after 23-25mins after receiving the last check, why is it executed at
this point?
I dont know what i can do anymore to solve this problem
Anyone have a idea what's going wrong at this point?
This happens by 7 of 290 Hosts.
-- Here my config related parts :
# nagios.cfg
command_check_interval=-1
# hosts.cfg
define host {
use generic-host-passive
host_name cmseprx6
alias cmseprx6
address 10.248.0.23
contact_groups scpcms-admins,operations
}
# services.cfg
define service {
use generic-passive
host_name cmseprx6
service_description atd
contact_groups scpcms-admins,operations
notification_period 24x7
notification_options w,u,c,r
}
define service {
use generic-passive
host_name cmseprx6
service_description squid
contact_groups scpcms-admins,operations
notification_period 24x7
notification_options w,u,c,r
}
# template.cfg
define service {
name generic-passive
active_checks_enabled 0
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 1
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
register 0
max_check_attempts 1
normal_check_interval 90
retry_check_interval 1
notification_interval 1440
freshness_threshold 5400
check_period 24x7
check_command check_dummy!1!"No Report received"
}
define host {
name generic-host-passive
notifications_enabled 1
event_handler_enabled 0
flap_detection_enabled 1
process_perf_data 0
retain_status_information 1
retain_nonstatus_information 1
active_checks_enabled 0
check_freshness 1
freshness_threshold 420
check_period 24x7
check_command check_dummy!2!"No Report, host maybe down"
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
register 0
}
_________________________________________________________________
On the road to retirement? Check out MSN Life Events for advice on how to
get there! http://lifeevents.msn.com/category.aspx?cid=Retirement
-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list