Nagios Check Time Issue
Rus Hughes
russell.hughes at gmail.com
Thu Feb 24 14:15:55 CET 2011
The max_check_attempts for all services is configured to be 4 an
example service configuration is :
define service {
retry_check_interval 1
contact_groups admins
check_command check_nrpe!check_swap
check_period 24x7
host_name somehostrarrarrar
max_check_attempts 4
normal_check_interval 1
notification_period 24x7
notification_interval 960
## --PUPPET_NAME-- (called '_naginator_name' in the manifest)
check_swap_vfantprov2
use generic-service
service_description swap
}
define service{
name generic-service ; The
'name' of this service template
active_checks_enabled 1 ; Active
service checks are enabled
passive_checks_enabled 1 ; Passive
service checks are enabled/accepted
parallelize_check 1 ; Active
service checks should be parallelized (disabling this can lead to
major performance problems)
obsess_over_service 1 ; We should
obsess over this service (if necessary)
check_freshness 0 ; Default is
to NOT check service 'freshness'
notifications_enabled 1 ; Service
notifications are enabled
event_handler_enabled 1 ; Service
event handler is enabled
flap_detection_enabled 1 ; Flap
detection is enabled
failure_prediction_enabled 1 ; Failure
prediction is enabled
process_perf_data 1 ; Process
performance data
retain_status_information 1 ; Retain
status information across program restarts
retain_nonstatus_information 1 ; Retain
non-status information across program restarts
is_volatile 0 ; The service
is not volatile
check_period 24x7 ; The service
can be checked at any time of the day
max_check_attempts 3 ; Re-check the
service up to 3 times in order to determine its final (hard) state
normal_check_interval 1 ; Check the
service every 10 minutes under normal conditions
retry_check_interval 1 ; Re-check the
service every two minutes until a hard state can be determined
contact_groups admins ;
Notifications get sent out to everyone in the 'admins' group
notification_options w,u,c,r ; Send notifications
about warning, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about
service problems every hour
notification_period 24x7 ;
Notifications can be sent out at any time
register 0 ; DONT
REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
On Thu, Feb 24, 2011 at 12:38 PM, Yueh-Hung Liu <yuehung.liu at gmail.com> wrote:
> how many attempts do you configure before a non-OK state becomes hard?
>
>
> On Thu, Feb 24, 2011 at 7:24 PM, Rus Hughes <russell.hughes at gmail.com> wrote:
>> Hi,
>>
>> I've been investigating an issue we have with Nagios Core 3.2.0 that
>> we're running on Redhat 5.4. We're being a bit ruthless and have
>> configured retry_check_interval and normal_check_interval to both be 1
>> on all hosts and services (20 hosts and 293 services).
>>
>> We're seeing massive delays between checks getting run for services
>> flagged as DOWN, even though the box has little load (0.2)
>>
>> Looking at the extended information page for a service that was DOWN
>> we're seeing events like this occur :
>>
>> At 10:40 a service that was DOWN had a check that was scheduled to run
>> at 10:25 but still hadn't run
>> At 11:02 I refreshed the page for the Nagios check
>> Nagios had run the check and changed the service state to UP
>> The last check time was set to be 10:25 though
>> Even though the check actually ran between 10:40 and 11:02
>>
>> Does anyone know why
>>
>> 1) Nagios is being 'lazy' when rechecking services marked as DOWN ?
>> We've configured retry_check_interval to 1 for all checks and theres
>> little load on the box and at most only about 4 Nagios processes
>> running at a time, so there are resources free to be used ..
>>
>> 2) Why Nagios is marking the Last Check Time to be the predicted Next
>> Scheduled Check time, even though the real time the check one is way
>> after? (Bug in Nagios?)
>>
>> Thanks,
>>
>> Rus
>>
>> ------------------------------------------------------------------------------
>> Free Software Download: Index, Search & Analyze Logs and other IT data in
>> Real-Time with Splunk. Collect, index and harness all the fast moving IT data
>> generated by your applications, servers and devices whether physical, virtual
>> or in the cloud. Deliver compliance at lower cost and gain new business
>> insights. http://p.sf.net/sfu/splunk-dev2dev
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
>> ::: Messages without supporting info will risk being sent to /dev/null
>>
>
> ------------------------------------------------------------------------------
> Free Software Download: Index, Search & Analyze Logs and other IT data in
> Real-Time with Splunk. Collect, index and harness all the fast moving IT data
> generated by your applications, servers and devices whether physical, virtual
> or in the cloud. Deliver compliance at lower cost and gain new business
> insights. http://p.sf.net/sfu/splunk-dev2dev
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list