NSCA and Nagios 2.0b3 - wrong service freshness

Christian Kleinfeld c_kleinfeld at hotmail.com
Tue May 17 08:27:11 CEST 2005
Previous message: Nagios 2.0b3 hangs on FreeBSD
Next message: SLA Reporting with Nagios or a 3rd Party Tool
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello guys

I'm using Nagios 2.0b3 with NSCA from the Tree.
My Systems send every 5 minutes a HeartBeat (passive HostCheck) to our
Nagios Server. Services Reports for Services in Critical/Warning state
will send every 5 minutes and every hour a complete Report for all
configured Services to Nagios via send_nsca and nsca Daemon.

NSCA runs at our Nagios Server in Daemon mode.

Sometimes Nagios forces passive checked hosts and services to stale
state without any reason. I say 'with no reason' because it received the
passive checks but nagios core thinks it didnt.

I explain it more in detail in a example Host.

Here is a snipped from my logfile :
>[Tue May 17 07:00:01 2005] EXTERNAL COMMAND: 
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:00:03 2005] EXTERNAL COMMAND: 
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;squid;0;OK - process squid is running 
>: PID= 2679 2681 ;
>[Tue May 17 07:00:05 2005] EXTERNAL COMMAND: 
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;atd;0;OK - process atd is running : 
>PID= 27123 ;
>[Tue May 17 07:00:05 2005] EXTERNAL COMMAND: 
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;mgetty;0;OK - process mgetty is 
>running : PID= 753 ;
>[Tue May 17 07:00:05 2005] EXTERNAL COMMAND: 
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;mingetty;0;OK - process mingetty is 
>running : PID= 748 749 750 751 752 813 ;
>[Tue May 17 07:00:06 2005] EXTERNAL COMMAND: 
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;syslogd;0;OK - process syslogd is 
>running : PID= 506 ;

ok, so we see nsca has written it to nagios core logic.

>[Tue May 17 07:05:00 2005] EXTERNAL COMMAND: 
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:10:01 2005] EXTERNAL COMMAND: 
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:15:00 2005] EXTERNAL COMMAND: 
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:20:00 2005] EXTERNAL COMMAND: 
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK

Heartbeat received, everything is ok at this moment.

>[Tue May 17 07:23:32 2005] SERVICE ALERT: 
>cmseprx6;atd;WARNING;HARD;1;WARNING: No Report received
>[Tue May 17 07:23:32 2005] SERVICE ALERT: 
>cmseprx6;mgetty;WARNING;HARD;1;WARNING: No Report received
>[Tue May 17 07:25:24 2005] SERVICE ALERT: 
>cmseprx6;mingetty;WARNING;HARD;1;WARNING: No Report received
>[Tue May 17 07:25:24 2005] SERVICE ALERT: 
>cmseprx6;squid;WARNING;HARD;1;WARNING: No Report
>received
>[Tue May 17 07:25:24 2005] SERVICE ALERT: 
>cmseprx6;syslogd;WARNING;HARD;1;WARNING: No Report received

And this is the strange thing.
I have a service freshness of 90 minutes and the services has alerted
after 23-25mins after receiving the last check, why is it executed at
this point?

I dont know what i can do anymore to solve this problem

Anyone have a idea what's going wrong at this point?

This happens by 7 of 290 Hosts.

-- Here my config related parts :
# nagios.cfg
command_check_interval=-1

# hosts.cfg
define host {
        use                     generic-host-passive
        host_name               cmseprx6
        alias                   cmseprx6
        address                 10.248.0.23
        contact_groups          scpcms-admins,operations
}

# services.cfg
define service {
        use                     generic-passive
        host_name               cmseprx6
        service_description     atd
        contact_groups          scpcms-admins,operations
        notification_period     24x7
        notification_options    w,u,c,r
}
define service {
        use                     generic-passive
        host_name               cmseprx6
        service_description     squid
        contact_groups          scpcms-admins,operations
        notification_period     24x7
        notification_options    w,u,c,r
}


# template.cfg
define service {
        name                            generic-passive
        active_checks_enabled           0
        passive_checks_enabled          1
        parallelize_check               1
        obsess_over_service             1
        check_freshness                 1
        notifications_enabled           1
        event_handler_enabled           1
        flap_detection_enabled          1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        register                        0
        max_check_attempts              1
        normal_check_interval           90
        retry_check_interval            1
        notification_interval           1440
        freshness_threshold             5400
        check_period                    24x7
        check_command                   check_dummy!1!"No Report received"
}

define host {
        name                            generic-host-passive
        notifications_enabled           1
        event_handler_enabled           0
        flap_detection_enabled          1
        process_perf_data               0
        retain_status_information       1
        retain_nonstatus_information    1
        active_checks_enabled   	0
        check_freshness         	1
        freshness_threshold     	420
        check_period            	24x7
        check_command           	check_dummy!2!"No Report, host maybe down"
        max_check_attempts      	10
        notification_interval   	120
        notification_period     	24x7
        notification_options    	d,u,r
        register                        0
}

_________________________________________________________________
On the road to retirement? Check out MSN Life Events for advice on how to 
get there! http://lifeevents.msn.com/category.aspx?cid=Retirement



-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
Previous message: Nagios 2.0b3 hangs on FreeBSD
Next message: SLA Reporting with Nagios or a 3rd Party Tool
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list