timeouts and performance info
Tobias Klausmann
klausman at schwarzvogel.de
Wed Aug 30 15:43:43 CEST 2006
Hi!
On Wed, 30 Aug 2006, Marc Powell wrote:
> > Active Service Checks:
> > <= 1 minute: 81 (4.6%)
> > <= 5 minutes: 1719 (97.4%)
> > <= 15 minutes: 1727 (97.9%)
> > <= 1 hour: 1727 (97.9%)
> > Since program start: 1727 (97.9%)
>
> This seems mostly normal for a 5 minute check_interval. The small
> difference between the 5 and 15 minute counts is normal as checks may be
> just starting to execute or still in progress at the 5 minute mark. It
> does appear that you have some number of services that are not scheduled
> for execution or are executing at really long intervals. Look at Service
> Detail and sort by last check. Re-examine your configuration for those
> services that do not appear to be scheduled properly.
I have a few services that are disabled entirely (don't check
actively, don't accept passive checks). Would they count in the
above statistic? They seem to fit in with the missing 2.1%
(100-97.9). Also, I saw a few checks that were last run about ~20
minutes ago. Those are log checks via NRPE that complete within
<1s (no noticeable delay) if run directly on the machine (as user
nagios of course). It seems acceptable (and I neither know why it
would take 20 minutes nor how to find out why), so I'm willing to
let it slide ;).
> Looks pretty good to me. The high max check latency number may have been
> a one-off event. If that number regularly changes and is always very
> high then you might want to verify that you're not starving nagios for
> check by running /path/to/nagios/bin/nagios -s
> /path/to/nagios/etc/nagios and make sure you meet or exceed it's
> recommended values.
I guessed as much for the one-off event. It doesn't change, so I
feel somewhat safe. As for the recommended values (-s), Nagios
says it's okay the way it is.
> > Active Hosts Checks:
> > <= 1 minute: 0 (0.0%)
> > <= 5 minutes: 3 (1.2%)
> > <= 15 minutes: 3 (1.2%)
> > <= 1 hour: 4 (1.6%)
> > Since program start: 27 (10.8%)
> >
> > and
> >
> > Check Execution Time: 0.02 sec 10.05 sec 0.208
> sec
> > Check Latency: 0.00 sec 17.48 sec 0.204
> sec
> > Percent State Change: 0.00% 0.00% 0.00%
>
> These look normal and expected. You've had 27 service failures since
> program start necessitating host checks.
That is in line with what I'd expect.
> > Am I the only one seeing a discrepancy here?
>
> The only discrepancy I see is likely due to configuration. You probably
> have check intervals or timeperiods misconfigured for ~30 services.
About that number of services are disabled entirely right now, so
if they count into the statistic, it explains the figures.
> > The only way I can make sense of this is that the "<= 15 minutes"
> > means "time from being scheduled to actually starting the
> > plugin". In that case I wonder what makes it take so long, the
>
> Check Latency is that number. On average nagios is able to run your
> checks within 3.043 seconds of when they are scheduled to run. The
> number you are referring to is just a simple count of the number of
> plugins that have been run in that time interval.
So it means "in the last N minutes, this many services completed"
and *not* "this many services needed N minutes to complete (from
being started to delivering the retval)"? That would be an eye
opener for me :)
Regards & Thanks,
Tobias
--
You don't need eyes to see, you need vision.
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list