Using Nagios to monitor "service-less" hosts

Andy Shellam (Mailing Lists) andy.shellam-lists at mailnetwork.co.uk
Wed Nov 8 23:45:02 CET 2006


Ted,

I've stopped Nagios, removed all ".dat" files from var, and restarted it 
- all checks are now pending.
However, I did look through retention.dat (I presume this is what you 
meant - status.sav didn't exist) before I killed it, and the 
check_interval parameter was not defined for any host.

I would think, surely, "state retention" only retains the service/host 
check states so if, for example, the Nagios machine reboots, when it 
comes back up it knows where it left off.  Otherwise if you change the 
config, you'd have to remember to remove all the .dat files (or at least 
retention.dat) in var before the config change takes effect, and I 
certainly haven't had to do that before.

And as far as Nagios was concerned, "scheduled active host checks" were 
OFF - or so it said in the config viewer.

I'll wait a couple of minutes, see where it goes from here......

OK 5 minutes have passed - no different.
Service SC-Gateway - Ping = checked and confirmed OK at 22:38:15
The host SC-Gateway = checked and confirmed OK at 22:38:15, then checked 
again at 22:39:40 and again at 22:41:40.
And note "Next active scheduled check" reads N/A.

Andy.

Tedman Eng wrote:
> If you have state retention enabled, then Nagios remembers lots of settings
> and does not "reset" them when reloading a config (otherwise it wouldn't be
> retaining).  "Host Active Checks Enabled" likely did not disable themselves
> after changing the .cfg file, because the state was "remembered" from
> previous runs.  Try stopping Nagios, clearing the status.sav and restarting
> Nagios.
>
>   
>> -----Original Message-----
>> From: Andy Shellam (Mailing Lists)
>> [mailto:andy.shellam-lists at mailnetwork.co.uk]
>> Sent: Wednesday, November 08, 2006 12:58 PM
>> To: Tedman Eng
>> Cc: nagios-users at lists.sourceforge.net
>> Subject: Re: [Nagios-users] Using Nagios to monitor 
>> "service-less" hosts
>>
>>
>> Hi Ted,
>>
>> I understand the distinction - I *did* have host checks actively 
>> scheduled (ie. the host parameter 'check_interval' set to 1 - this is 
>> now 0 so host checks shouldn't be scheduled, right?)  Yet Nagios IS 
>> checking the hosts every few minutes roughly, regardless of child 
>> service status.
>>
>> Here's a dead simple example - the FH-Gateway - it has a 
>> single service, 
>> which is a Ping.  The host also has a Ping set as it's 
>> active_check_command parameter.
>> Now, if I show you the service breakdown for the Ping _service_ on 
>> FH-Gateway:
>>
>> Current Status: 	
>>   OK    
>> Status Information: 	PING OK - Packet loss = 0%, RTA = 3.02 ms
>> Performance Data: 	
>> Current Attempt: 	1/2
>> State Type: 	HARD
>> Last Check Type: 	ACTIVE
>> Last Check Time: 	08-11-2006 20:49:37
>> Status Data Age: 	0d 0h 0m 51s
>> Next Scheduled Active Check:   	08-11-2006 20:50:37
>> Latency: 	0.607 seconds
>> Check Duration: 	9.013 seconds
>> Last State Change: 	08-11-2006 10:46:46
>> Current State Duration: 	0d 10h 3m 42s
>>
>>
>> Nagios reports it's been in the same state (ie. OK) for 10 hours, 3 
>> minutes, and 42 seconds right?
>> So why was the host checked only a few seconds ago?
>>
>> Host Status: 	
>>   UP    
>> Status Information: 	PING OK - Packet loss = 0%, RTA = 0.27 ms
>> Performance Data: 	
>> Current Attempt: 	1/2
>> State Type: 	HARD
>> Last Check Type: 	ACTIVE
>> Last Check Time: 	08-11-2006 20:50:49
>> Status Data Age: 	0d 0h 0m 39s
>> Next Scheduled Active Check:   	N/A
>> Latency: 	9.113 seconds
>> Check Duration: 	9.011 seconds
>> Last State Change: 	07-11-2006 06:20:35
>> Current State Duration: 	1d 14h 30m 53s
>> Last Host Notification: 	N/A
>> Current Notification Number:   	0  
>> Is This Host Flapping? 	
>>   NO  
>> Percent State Change: 	0.00%
>> In Scheduled Downtime? 	
>>   NO  
>> Last Update: 	08-11-2006 20:51:16
>>
>>
>> If the general line of thinking is correct, Nagios should have last 
>> checked the host back at (or around) 10:46 this morning when 
>> there was a 
>> blip in the service check.  But it didn't.  It does check 
>> them every 1-2 
>> minutes.
>> My check_interval parameter is 0 - the config viewer in the web CGIs 
>> shows "enabled active checks" as NO for each host.
>>
>> Since I've been writing this - the above host has been 
>> checked again at 
>> 20:54:49 - exactly 4 minutes since the last check.  No change in the 
>> service status - 10 hours, 9 minutes now.
>>
>> Any ideas?
>>
>> Andy.
>>
>>
>>
>> Tedman Eng wrote:
>>     
>>> Host checks are not actively scheduled in normal operation.
>>>
>>> You could go months without requiring a host check, and the 
>>>       
>> status age of
>>     
>>> the host check will show something like 81 days for example.
>>>
>>> If you see recent host checks, then that means there was a 
>>>       
>> service problem
>>     
>>> and Nagios wanted to be sure it wasn't the host.
>>>
>>> Perhaps if you thought of "host check" as "network link 
>>>       
>> status", it would
>>     
>>> make the distinction more clear.
>>>
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: Andy Shellam (Mailing Lists)
>>>> [mailto:andy.shellam-lists at mailnetwork.co.uk]
>>>> Sent: Wednesday, November 08, 2006 11:56 AM
>>>> To: Sloane, Robert Raymond
>>>> Cc: nagios-users at lists.sourceforge.net
>>>> Subject: Re: [Nagios-users] Using Nagios to monitor 
>>>> "service-less" hosts
>>>>
>>>>
>>>> Sloane, Robert Raymond wrote:
>>>>     
>>>>         
>>>>>> Last Check Time: 	08-11-2006 19:34:40
>>>>>> Next Scheduled Active Check:   	N/A
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> Interesting.  Nagios thinks the last check was run over a 
>>>>>           
>> month ago.
>>     
>>>>>   
>>>>>       
>>>>>           
>>>> No, thankfully!  That date is the 8th November (British format.)
>>>>     
>>>>         
>>>>> You wouldn't see anything about hosts in the scheduling 
>>>>>           
>> queue.  Host
>>     
>>>>> checks are run immediately, not through the queue.  That is 
>>>>>       
>>>>>           
>>>> why it is
>>>>     
>>>>         
>>>>> best to not use them.
>>>>>   
>>>>>       
>>>>>           
>>>> I did when the check_interval was set to 1 in the hosts - it 
>>>> showed the 
>>>> host name and a blank service column.
>>>> I'd mentioned this only to prove the point that the checks do 
>>>> not seem 
>>>> to be scheduled any more, so I cannot figure out why it's 
>>>> still running 
>>>> the host checks at (seemingly) regular intervals.
>>>>
>>>> There are no hosts under that machine (or indeed above 
>>>>         
>> it), and all 
>>     
>>>> services checks are up and have been for a good 6-8 hours.
>>>>
>>>> I'm stumped!
>>>>
>>>> Andy.
>>>>
>>>> --------------------------------------------------------------
>>>> -----------
>>>> Using Tomcat but need to do more? Need to support web 
>>>> services, security?
>>>> Get stuff done quickly with pre-integrated technology to make 
>>>> your job easier
>>>> Download IBM WebSphere Application Server v.1.0.1 based on 
>>>> Apache Geronimo
>>>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&
>>>>     
>>>>         
>>> dat=121642
>>> _______________________________________________
>>> Nagios-users mailing list
>>> Nagios-users at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>>> ::: Please include Nagios version, plugin version (-v) and 
>>>       
>> OS when reporting
>>     
>>> any issue. 
>>> ::: Messages without supporting info will risk being sent 
>>>       
>> to /dev/null
>>     
>>>
>>>
>>>   
>>>       
>> --------------------------------------------------------------
>> -----------
>> Using Tomcat but need to do more? Need to support web 
>> services, security?
>> Get stuff done quickly with pre-integrated technology to make 
>> your job easier
>> Download IBM WebSphere Application Server v.1.0.1 based on 
>> Apache Geronimo
>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&
>>     
> dat=121642
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
>
> !DSPAM:37,455258fb40411755016805!
>
>
>   


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list