If anyone figures this out it would be greatly appreciated. I posted about the same thing a few weeks back. I had services defined with "max_check_attempts" set to 3 and wasn't getting alerted because it would get stuck on the first or second attempt. After digging through my logs I saw that when this was happening, the service would go critical but the next check would never occur. There may be some strange bug in the code that schedules retries. I'm not sure, but I added a cleanup script that shuts down nagios, removes all temporary files and restarts which seems to fix the problem for a short while.<br>
<br><br><br><div class="gmail_quote">On Mon, Nov 30, 2009 at 12:16 PM, john <span dir="ltr"><<a href="mailto:lists@cloned.org.uk">lists@cloned.org.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I've moved my config from an old nagios2 installation to nagios 3.0.6<br>
(debian's version) and when performing checks, it only seems to do one<br>
service check attempt so never seems to alert.<br>
<br>
I've got various things in an unknown and critical state but they are all<br>
listed as Attempt 1 of 3 even if I force an active check to occur. My<br>
intervals for checking haven't changed since nagios2<br>
<br>
Here's the host and service that isn't alerting (ip removed):<br>
<br>
define host{<br>
host_name moocow<br>
alias moocow<br>
address x.x.x.x<br>
parents switch1<br>
hostgroups servers<br>
check_command check-host-alive<br>
max_check_attempts 3<br>
check_period 24x7<br>
check_interval 1800<br>
retain_nonstatus_information 1<br>
contact_groups notify.john<br>
notification_interval 1800<br>
notification_period 24x7<br>
notification_options d,u,r<br>
}<br>
<br>
<br>
# PING<br>
<br>
define service{<br>
host_name moocow<br>
service_description PING<br>
servicegroups servers<br>
max_check_attempts 3<br>
normal_check_interval 300<br>
retry_check_interval 120<br>
check_period 24x7<br>
contact_groups notify.john<br>
notification_interval 7200<br>
notification_period 24x7<br>
notification_options w,u,c,r<br>
check_command check_ping!75.0,20%!150.0,60%<br>
}<br>
<br>
Can anyone suggest why this wouldn't alert me?<br>
<br>
The config test only throws some warnings about "notification interval<br>
less than its check interval" but not for any services that are having<br>
this problem<br>
<br>
Cheers,<br>
<br>
john<br>
<br>
------------------------------------------------------------------------------<br>
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day<br>
trial. Simplify your report design, integration and deployment - and focus on<br>
what you do best, core application coding. Discover what's new with<br>
Crystal Reports now. <a href="http://p.sf.net/sfu/bobj-july" target="_blank">http://p.sf.net/sfu/bobj-july</a><br>
_______________________________________________<br>
Nagios-users mailing list<br>
<a href="mailto:Nagios-users@lists.sourceforge.net">Nagios-users@lists.sourceforge.net</a><br>
<a href="https://lists.sourceforge.net/lists/listinfo/nagios-users" target="_blank">https://lists.sourceforge.net/lists/listinfo/nagios-users</a><br>
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.<br>
::: Messages without supporting info will risk being sent to /dev/null<br>
</blockquote></div><br>