<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">
<TITLE>host check strangeness - odd behavior in Nagios scheduling queue</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->
<P><FONT SIZE=2 FACE="Arial">Greetings All, </FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">I'm seeing a problem with our host check scheduling. There are two major issues, I can't tell if they are symptoms of the same problem or two separate issues. I've provided the configs and information that I know to be applicable, if there's other pertinent information please let me know, I'm more than happy to provide it. </FONT></P>
<P><FONT SIZE=2 FACE="Arial">First Here's my Nagios config:</FONT>
<BR><FONT SIZE=2 FACE="Arial">Single Nagios box (no distributed setup)</FONT>
<BR><FONT SIZE=2 FACE="Arial">64-bit RHEL 5.3</FONT>
<BR><FONT SIZE=2 FACE="Arial">Nagios 3.1.2 (I upgraded from 3.0.6 to see if that would fix the issues)</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">Problem 1. Some host checks are getting *stuck* in scheduling queue. When I look at the scheduling queue these hosts are always listed with the 'last check' time the same as it's 'next check' time. See attached screen shot (problem 1). They typically stay at the top of the queue for an hour or two.</FONT></P>
<P><FONT SIZE=2 FACE="Arial">Host configuration for one of them:</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> host_name hostxxx</FONT>
<BR><FONT SIZE=2 FACE="Arial"> alias Oracle</FONT>
<BR><FONT SIZE=2 FACE="Arial"> use srvhost-os-2000,srvhost-physical,srvhost-oracle,srvhost-non-production,srvhost-all</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notification_period aperture</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> }</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">Applicable Templates:</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name generic-host</FONT>
<BR><FONT SIZE=2 FACE="Arial"> check_period 24x7</FONT>
<BR><FONT SIZE=2 FACE="Arial"> event_handler_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> flap_detection_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> process_perf_data 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> retain_status_information 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> retain_nonstatus_information 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notifications_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 0</FONT>
<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name generic-pnp</FONT>
<BR><FONT SIZE=2 FACE="Arial"> action_url /pnp/index.php?host=$HOSTNAME$' onmouseover="get_g('$HOSTNAME$','_HOST_')" onmouseout="clear_g()"</FONT></P>
<P><FONT SIZE=2 FACE="Arial"> register 0</FONT>
<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name srvhost-all</FONT>
<BR><FONT SIZE=2 FACE="Arial"> alias All Servers</FONT>
<BR><FONT SIZE=2 FACE="Arial"> check_command check-nt-alive</FONT>
<BR><FONT SIZE=2 FACE="Arial"> use generic-pnp,generic-host</FONT>
<BR><FONT SIZE=2 FACE="Arial"> max_check_attempts 3</FONT>
<BR><FONT SIZE=2 FACE="Arial"> check_interval 60</FONT>
<BR><FONT SIZE=2 FACE="Arial"> retry_interval 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> active_checks_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> passive_checks_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> flap_detection_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> process_perf_data 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> retain_status_information 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> retain_nonstatus_information 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> contact_groups +servers</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notification_interval 240</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notification_period 24x7</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notification_options d,u,r</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notifications_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 0</FONT>
<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name srvhost-non-production</FONT>
<BR><FONT SIZE=2 FACE="Arial"> alias Non production servers</FONT>
<BR><FONT SIZE=2 FACE="Arial"> hostgroups +SRV_Cls-non-production</FONT>
<BR><FONT SIZE=2 FACE="Arial"> check_interval 120</FONT>
<BR><FONT SIZE=2 FACE="Arial"> retry_interval 20</FONT>
<BR><FONT SIZE=2 FACE="Arial"> passive_checks_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> contact_groups +servers</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notification_interval 480</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notification_period workhours</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notification_options d,u,r</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notifications_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 0</FONT>
<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name srvhost-oracle</FONT>
<BR><FONT SIZE=2 FACE="Arial"> alias Oracle servers</FONT>
<BR><FONT SIZE=2 FACE="Arial"> hostgroups +SRV_app-oracle</FONT>
<BR><FONT SIZE=2 FACE="Arial"> contact_groups +oracle</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 0</FONT>
<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name srvhost-physical</FONT>
<BR><FONT SIZE=2 FACE="Arial"> alias Servers that are running on physical hardware</FONT>
<BR><FONT SIZE=2 FACE="Arial"> hostgroups +SRV_platform-physical</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 0</FONT>
<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name srvhost-os-2000</FONT>
<BR><FONT SIZE=2 FACE="Arial"> alias Servers running Windows 2000 Server</FONT>
<BR><FONT SIZE=2 FACE="Arial"> hostgroups +SRV_os-win2000</FONT>
<BR><FONT SIZE=2 FACE="Arial"> check_command check-nt-alive</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 0</FONT>
<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>
<BR>
<P><FONT SIZE=2 FACE="Arial">Problem 2. Many of our hosts are not running host checks, they are in the scheduling queue but don't execute. Looking at the scheduling queue I can see many of the hosts that have host 'last check' times from several weeks ago. They show up in the queue but never run their host checks (or don't seem to). These same hosts run service checks on time without issue. Screen shot attached (problem 2).</FONT></P>
<P><FONT SIZE=2 FACE="Arial">Host config for one of the hosts not running host checks:</FONT>
<BR><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> host_name hostxxxx</FONT>
<BR><FONT SIZE=2 FACE="Arial"> alias media server</FONT>
<BR><FONT SIZE=2 FACE="Arial"> use srvhost-production,srvhost-physical,srvhost-os-2003,srvhost-all</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> }</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name generic-host</FONT>
<BR><FONT SIZE=2 FACE="Arial"> check_period 24x7</FONT>
<BR><FONT SIZE=2 FACE="Arial"> event_handler_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> flap_detection_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> process_perf_data 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> retain_status_information 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> retain_nonstatus_information 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notifications_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 0</FONT>
<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name generic-pnp</FONT>
<BR><FONT SIZE=2 FACE="Arial"> action_url /pnp/index.php?host=$HOSTNAME$' onmouseover="get_g('$HOSTNAME$','_HOST_')" onmouseout="clear_g()"</FONT></P>
<P><FONT SIZE=2 FACE="Arial"> register 0</FONT>
<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name srvhost-all</FONT>
<BR><FONT SIZE=2 FACE="Arial"> alias All Servers</FONT>
<BR><FONT SIZE=2 FACE="Arial"> check_command check-nt-alive</FONT>
<BR><FONT SIZE=2 FACE="Arial"> use generic-pnp,generic-host</FONT>
<BR><FONT SIZE=2 FACE="Arial"> max_check_attempts 3</FONT>
<BR><FONT SIZE=2 FACE="Arial"> check_interval 60</FONT>
<BR><FONT SIZE=2 FACE="Arial"> retry_interval 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> active_checks_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> passive_checks_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> flap_detection_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> process_perf_data 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> retain_status_information 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> retain_nonstatus_information 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> contact_groups +servers</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notification_interval 240</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notification_period 24x7</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notification_options d,u,r</FONT>
<BR><FONT SIZE=2 FACE="Arial"> notifications_enabled 1</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 0</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name srvhost-os-2003</FONT>
<BR><FONT SIZE=2 FACE="Arial"> alias Servers running Windows 2003</FONT>
<BR><FONT SIZE=2 FACE="Arial"> hostgroups +SRV_os-win2003</FONT>
<BR><FONT SIZE=2 FACE="Arial"> check_command check-nt-alive</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 0</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name srvhost-physical</FONT>
<BR><FONT SIZE=2 FACE="Arial"> alias Servers that are running on physical hardware</FONT>
<BR><FONT SIZE=2 FACE="Arial"> hostgroups +SRV_platform-physical</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 0</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">define host {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> name srvhost-production</FONT>
<BR><FONT SIZE=2 FACE="Arial"> alias All servers in production mode</FONT>
<BR><FONT SIZE=2 FACE="Arial"> hostgroups +SRV_Cls-production</FONT>
<BR><FONT SIZE=2 FACE="Arial"> contact_groups +helpdesk,servers,servers-off-hours,thesolver</FONT>
<BR><FONT SIZE=2 FACE="Arial"> register 0</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">define command {</FONT>
<BR><FONT SIZE=2 FACE="Arial"> command_name check-nt-alive</FONT>
<BR><FONT SIZE=2 FACE="Arial"> command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p 135 -t 30</FONT>
<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">Any ideas or help is tracking this down is appreciated. I'm pretty sure it's a bug in the code, but I suppose it's possible my configuration is off somehow. :-) </FONT></P>
<P><FONT SIZE=2 FACE="Arial">Thanks Again, </FONT>
</P>
<P><FONT SIZE=2 FACE="Arial">-greg</FONT>
</P>
</BODY>
</HTML>