Nagios-devel digest, Vol 1 #1058 - 2 msgs

Andreas Ericsson ae at op5.se
Fri Apr 21 12:20:35 CEST 2006


Please refrain from top-posting. It makes following the discussion a lot 
harder.

Vegard Hanssen wrote:
> There is a problem with defining check-host-alive to check_nrpe.
> 
> A normal problem in my setup:
> 
> One host get a too high load. This will time out the nrpe checks, giving 
> me 5-10 sms for the timeout, and then (2 minutes later) the same sms for 
> OK. The host isn't down, it's just in a very busy state, and I know this 
> since I don't get a host down message. If I change the check-host-alive 
> to nrpe I will then get a host down message, which can mean anything 
> from high load, host is unreachable or host is actually down. Host down 
> = I have to drop everything and get to work, High Load = let's give it a 
> minute to cool down first.
> 
> I could do the same as Øysten Bleie suggest, but I'm not sure that's 
> good either. Actually I'm not sure what's best to do, so I've for the 
> moment stuck with all the sms.
> 

You could bite the bullet and set up the service dependencies. You want 
Nagios to automagically understand that there's a relationship between 
the nrpe-based services and the nrpe-service itself, but since Nagios 
has no understanding of what checks you're running this desire is quite 
complex to implement. Nagios lets you do the part requiring accuracy 
(namely the "what needs what" part) and then handles the rest for you.

With some clever scripting and a generic naming convention I'm sure 
you'll be able to set everything up in half an hour, saving you quite a 
few notifications on failures.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the Developers mailing list