Nagios is ignoring the retry_interval setting
FTL Nagios
ftlnagios at gmail.com
Fri Dec 7 11:56:16 CET 2012
Hi,
Apologies for the delay, been very busy with other things.
Right I have put Nagios into Debug this morning and rerun the tests.
I let it get a couple of successful pings to the server then pulled the
network cable from it.
Behaviour is completely different this morning!!!!
The host check is behaving now and rechecking every 3 minutes as its told
too in the host template. I got my text and email alert to say the host was
down when I expected it!
But now its the service check that is running every 1 minute now, which its
not told too when in problem state.
My service template clearly states when in problem state to retry_interval
of 3 minutes:
define service{
name service-server ; The name of this host
template (used above in the checks)
check_period server_24x7 ; Server are monitored at
all times
check_interval 1 ; Server are checked every 1
minute when in OK state
retry_interval 3 ; Server checked every 3
minutes if in problem state
max_check_attempts 3 ; Server checked 3 times to
determine if its Up or Down state
notification_period server_24x7 ; Emails and Text are
sent out any time of day
notification_interval 3 ; Resend Notifications
every 3 minutes
notification_options c,r ; Only send alerts for
servers in CRITICAL or RECOVERY state
notifications_enabled 0 ; Notifications are
disabled
contact_groups servers email, servers sms ; Alerts sent
to contacts in these groups
event_handler_enabled 1 ; Host event handler is
enabled
process_perf_data 1 ; Performace data is
processed
retain_status_information 1 ; Status Info is kept
between server restarts
retain_nonstatus_information 1 ; Non-Status information
is kept between server restarts
passive_checks_enabled 0 ; Passive Checks are
disabled
obsess_over_service 0 ; We do not obsess over
the server if in problem state
check_freshness 0 ; We do not check this
server for freshness
flap_detection_enabled 0 ; Flap Detection is
disabled
failure_prediction_enabled 0 ; We will wait for it to
actually fail thankyou!!
}
And even though its checking every minute, it went straight to Hard State on
the first check it detected it down and has stayed on check 1/3 Hard State
throughout
I really don't understand what is happening here.
The only thing different between this setup and my old nagios box is the
version - old box was 3.31, this new server is 3.4.1, I am using the same
config files that worked fine before.
Here is the debug logfiles of the above testing.
http://dl.dropbox.com/u/895609/nagios.debug1
http://dl.dropbox.com/u/895609/nagios.debug2
If you see anything please let me know, im getting angry with all the
alerts!!! :-)
Thankyou
-----Original Message-----
From: Giorgio Zarrelli [mailto:zarrelli at linux.it]
Sent: 29 November 2012 19:24
To: Nagios Users List
Subject: Re: [Nagios-users] Nagios is ignoring the retry_interval setting
Hi,
do not seee anything wrong. Could you set debug=-1
repeat the problem and put the log online?
Giorgio
<quota chi="Andrew Thompson">
> Hi Georgio,
>
> The whole test cfg I am using to try troubleshoot this can be found at:
>
> http://dl.dropbox.com/u/895609/test.cfg
>
> This is a direct copy of my main servers config but with the rest of
> the servers and some templates for other server checks taken out
>
>
>
> Kind Regards
> Andrew
>
> From: Andrew Thompson
> Sent: 29 November 2012 16:11
> To: nagios-users at lists.sourceforge.net
> Subject: Nagios is ignoring the retry_interval setting
>
> Hi,
>
> My nagios box has decided to stop listening to the retry_interval
> entry in my templates.
>
> My server template reads:
>
> define host{
> name host-server
> check_period server_24x7
> check_interval 1
> retry_interval 3
> max_check_attempts 3
> notification_period server_24x7
> notification_interval 3
> notification_options d,r
> notifications_enabled 1
> contact_groups servers email, servers sms
> event_handler_enabled 1
> process_perf_data 1
> retain_status_information 1
> retain_nonstatus_information 1
> passive_checks_enabled 0
> obsess_over_host 0
> check_freshness 0
> flap_detection_enabled 0
> failure_prediction_enabled 0
> }
>
> Now this is what happens:
>
>
> * Server goes down at 1pm.
>
> * I check the next scheduled check and it clearly states 1.03pm
>
> * But at 1.01pm it checks again and then spits out an email and
> text message saying the server is down.
>
> Completely ignoring the retry_interval setting!!!
>
> Id expect from the above:
>
>
> * 1pm server goes down
>
> * 1.03pm check 2 is done
>
> * 1.06pm check 3 is done and determined hard state.
>
> * At 1.06pm the notification should be sent out.
>
> Why is this, is something in my config wrong?
>
> Ubuntu 12.04 desktop and Nagios 3.4.1
>
> Thanks
>
>
> ----------------------------------------------------------------------
> -------- Keep yourself connected to Go Parallel:
> VERIFY Test and improve your parallel project with help from experts
> and peers.
> http://goparallel.sourceforge.net_____________________________________
> __________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
----------------------------------------------------------------------------
--
Keep yourself connected to Go Parallel:
VERIFY Test and improve your parallel project with help from experts and
peers. http://goparallel.sourceforge.net
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue.
::: Messages without supporting info will risk being sent to /dev/null
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list