NSCA and Nagios 2.0b3 - freshnes checks

Christian Kleinfeld c_kleinfeld at hotmail.com
Tue May 17 15:05:18 CEST 2005


Hello guys

sorry - i posted this before in nagios-users, but i think nagios-devel is a 
better place for this issue.

I'm using Nagios 2.0b3 with NSCA from the CVS-Tree.
My Systems send every 5 minutes a HeartBeat (passive HostCheck) to our
Nagios Server. Services Reports for Services in Critical/Warning state
will send every 5 minutes and every hour a complete Report for all
configured Services to Nagios via send_nsca and nsca Daemon. (2 connections 
per host, first passive host check and the second connection sends all 
service results, if send_nsca timeouts it try's it again 3 times).

NSCA runs at our Nagios Server in Daemon mode.

Sometimes Nagios forces passive checked hosts and services to stale
state without any reason. I say 'with no reason' because it received the
passive checks but nagios core thinks it didnt.

I explain it more in detail in a example Host.

Here is a snipped from my logfile :
>[Tue May 17 07:00:01 2005] EXTERNAL COMMAND: 
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:00:03 2005] EXTERNAL COMMAND: 
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;squid;0;OK - process squid is running 
>: PID= 2679 2681 ;
>[Tue May 17 07:00:05 2005] EXTERNAL COMMAND: 
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;atd;0;OK - process atd is running : 
>PID= 27123 ;
>[Tue May 17 07:00:05 2005] EXTERNAL COMMAND: 
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;mgetty;0;OK - process mgetty is 
>running : PID= 753 ;
>[Tue May 17 07:00:05 2005] EXTERNAL COMMAND: 
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;mingetty;0;OK - process mingetty is 
>running : PID= 748 749 750 751 752 813 ;
>[Tue May 17 07:00:06 2005] EXTERNAL COMMAND: 
>PROCESS_SERVICE_CHECK_RESULT;cmseprx6;syslogd;0;OK - process syslogd is 
>running : PID= 506 ;

ok, so we see nsca has written it to nagios core logic.

>[Tue May 17 07:05:00 2005] EXTERNAL COMMAND: 
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:10:01 2005] EXTERNAL COMMAND: 
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:15:00 2005] EXTERNAL COMMAND: 
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK
>[Tue May 17 07:20:00 2005] EXTERNAL COMMAND: 
>PROCESS_HOST_CHECK_RESULT;cmseprx6;0;OK

Heartbeat received, everything is ok at this moment.

>[Tue May 17 07:23:32 2005] SERVICE ALERT: 
>cmseprx6;atd;WARNING;HARD;1;WARNING: No Report received
>[Tue May 17 07:23:32 2005] SERVICE ALERT: 
>cmseprx6;mgetty;WARNING;HARD;1;WARNING: No Report received
>[Tue May 17 07:25:24 2005] SERVICE ALERT: 
>cmseprx6;mingetty;WARNING;HARD;1;WARNING: No Report received
>[Tue May 17 07:25:24 2005] SERVICE ALERT: 
>cmseprx6;squid;WARNING;HARD;1;WARNING: No Report
>received
>[Tue May 17 07:25:24 2005] SERVICE ALERT: 
>cmseprx6;syslogd;WARNING;HARD;1;WARNING: No Report received

And this is the strange thing.
I have a service freshness of 90 minutes and the services has alerted
after 23-25mins after receiving the last check, why is it executed at
this point?

I dont know what i can do anymore to solve this problem

Anyone have a idea what's going wrong at this point?

This happens by 7 of 290 Hosts.

-- Here my config related parts :
# nagios.cfg
command_check_interval=-1
check_service_freshness=1
service_freshness_check_interval=60 # nagios default, but doesnt care - we 
use other freshnes values at our templates
check_host_freshness=1
host_freshness_check_interval=420
accept_passive_service_checks=1
accept_passive_host_checks=1

# hosts.cfg
define host {
        use                     generic-host-passive
        host_name               cmseprx6
        alias                   cmseprx6
        address                 10.248.0.23
        contact_groups          scpcms-admins,operations
}

# services.cfg
define service {
        use                     generic-passive
        host_name               cmseprx6
        service_description     atd
        contact_groups          scpcms-admins,operations
        notification_period     24x7
        notification_options    w,u,c,r
}
define service {
        use                     generic-passive
        host_name               cmseprx6
        service_description     squid
        contact_groups          scpcms-admins,operations
        notification_period     24x7
        notification_options    w,u,c,r
}


# template.cfg
define service {
        name                            generic-passive
        active_checks_enabled           0
        passive_checks_enabled          1
        parallelize_check               1
        obsess_over_service             1
        check_freshness                 1
        notifications_enabled           1
        event_handler_enabled           1
        flap_detection_enabled          1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        register                        0
        max_check_attempts              1
        normal_check_interval           90
        retry_check_interval            1
        notification_interval           1440
        freshness_threshold             5400
        check_period                    24x7
        check_command                   check_dummy!1!"No Report received"
}

define host {
        name                            generic-host-passive
        notifications_enabled           1
        event_handler_enabled           0
        flap_detection_enabled          1
        process_perf_data               0
        retain_status_information       1
        retain_nonstatus_information    1
        active_checks_enabled   	0
        check_freshness         	1
        freshness_threshold     	420
        check_period            	24x7
        check_command           	check_dummy!2!"No Report, host maybe down"
        max_check_attempts      	10
        notification_interval   	120
        notification_period     	24x7
        notification_options    	d,u,r
        register                        0
}

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar – get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/



-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click




More information about the Developers mailing list