Performance issues, too
Tobias Klausmann
klausman at schwarzvogel.de
Thu Dec 21 11:33:46 CET 2006
Hi!
On Tue, 19 Dec 2006, Andreas Ericsson wrote:
> >>> SERVICE SCHEDULING INFORMATION
> >>> -------------------------------
> >>> Total services: 2836
> >>> Total scheduled services: 2836
> >>> Service inter-check delay method: SMART
> >>> Average service check interval: 2225.56 sec
> >> This is, as you point out below, quite odd. What's your _longest_
> >> normal_check_interval for services?
> >
> > The longest check_interval is 86400 seconds. It's a SSL cert
> > freshness check. I figured it wasn't necesseary to check that
> > more often than once a day. I also have check_intervals of 3, 5,
> > 15, 20, 30 and 1440 seconds. The latter is also a cert freshness
> > check which is lower because the customer wanted it to be that
> > short.
> >
>
> Try changing the really long intervals to something shorter or
> commenting them out completely and see what happens. Checking a
> certificate is not a particularly heavy operation so it doesn't matter
> much if you run it ever 5 minutes. On the server side it just gets
> handed out from cache, so it's not heave there either.
Actually, I was horribly wrong with that statement up there.
As it turned out, the check_interval was set to 86400. From that
I jumped to the conclusion "ah, one day" - familiar numbers do
that to you. But the base unit of check_interval isn't 1s, it's 1
minute. So the check_interval was 60 days. Fortunately, it was
only one such check which we quickly eliminated before producing
the second set of graphs I mentioned elsewhere in the thread.
Now, the longest check_interval truly is one day, 1440 minutes.
The average service check interval reported by -s is now 419
seconds. Still not terribly short, but it proves that the
86400-minute-monster was to blame for the 2200+ seconds.
Changing those once-a-day checks to 5 minutes is an option, but
I'd rather wait a little to give everybody on the list some time
to look at the graphs and come up with nifty ideas.
I have the suspicion that our check latency might converge on 419
seconds - but I'd rather not test it, we'd be well beyond the
300s-interval most of our checks are designed for.
> > Oops, forgot to mention that. Yes, a server farm is being rebuilt
> > currently. As I didn't want all the host check timeouts to make
> > matters much, much, worse, I disabled them entirely.
> >
>
> Ah, that explains it then. It shouldn't matter, but unless the
> experiment I suggested above turns up anything useful, would you mind
> commenting them out and testing that?
I'll do that if removing the day-spaced-checks doesn't help.
Regards & Thanks,
Tobias
--
Never touch a burning system.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list