timeouts and performance info

Tobias Klausmann klausman at schwarzvogel.de
Wed Aug 30 15:43:43 CEST 2006


Hi! 

On Wed, 30 Aug 2006, Marc Powell wrote:
> > Active Service Checks:
> > <= 1 minute:	81 (4.6%)
> > <= 5 minutes:	1719 (97.4%)
> > <= 15 minutes:	1727 (97.9%)
> > <= 1 hour:	1727 (97.9%)
> > Since program start:  	1727 (97.9%)
> 
> This seems mostly normal for a 5 minute check_interval. The small
> difference between the 5 and 15 minute counts is normal as checks may be
> just starting to execute or still in progress at the 5 minute mark. It
> does appear that you have some number of services that are not scheduled
> for execution or are executing at really long intervals. Look at Service
> Detail and sort by last check. Re-examine your configuration for those
> services that do not appear to be scheduled properly.

I have a few services that are disabled entirely (don't check
actively, don't accept passive checks). Would they count in the
above statistic? They seem to fit in with the missing 2.1%
(100-97.9). Also, I saw a few checks that were last run about ~20
minutes ago. Those are log checks via NRPE that complete within
<1s (no noticeable delay) if run directly on the machine (as user
nagios of course). It seems acceptable (and I neither know why it
would take 20 minutes nor how to find out why), so I'm willing to
let it slide ;).

> Looks pretty good to me. The high max check latency number may have been
> a one-off event. If that number regularly changes and is always very
> high then you might want to verify that you're not starving nagios for
> check by running /path/to/nagios/bin/nagios -s
> /path/to/nagios/etc/nagios and make sure you meet or exceed it's
> recommended values.

I guessed as much for the one-off event. It doesn't change, so I
feel somewhat safe. As for the recommended values (-s), Nagios
says it's okay the way it is.

> > Active Hosts Checks:
> > <= 1 minute:	0 (0.0%)
> > <= 5 minutes:	3 (1.2%)
> > <= 15 minutes:	3 (1.2%)
> > <= 1 hour:	4 (1.6%)
> > Since program start:  	27 (10.8%)
> > 
> > and
> > 
> > Check Execution Time:  	0.02 sec	10.05 sec	0.208
> sec
> > Check Latency:		0.00 sec	17.48 sec	0.204
> sec
> > Percent State Change:	0.00%	0.00%	0.00%
> 
> These look normal and expected. You've had 27 service failures since
> program start necessitating host checks.

That is in line with what I'd expect.

> > Am I the only one seeing a discrepancy here?
> 
> The only discrepancy I see is likely due to configuration. You probably
> have check intervals or timeperiods misconfigured for ~30 services.

About that number of services are disabled entirely right now, so
if they count into the statistic, it explains the figures.

> > The only way I can make sense of this is that the "<= 15 minutes"
> > means "time from being scheduled to actually starting the
> > plugin". In that case I wonder what makes it take so long, the
> 
> Check Latency is that number. On average nagios is able to run your
> checks within 3.043 seconds of when they are scheduled to run. The
> number you are referring to is just a simple count of the number of
> plugins that have been run in that time interval.

So it means "in the last N minutes, this many services completed"
and *not* "this many services needed N minutes to complete (from
being started to delivering the retval)"? That would be an eye
opener for me :)

Regards & Thanks,
Tobias
-- 
You don't need eyes to see, you need vision.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list