Performance issues, too

Tobias Klausmann klausman at schwarzvogel.de
Tue Dec 19 11:40:10 CET 2006


Hi! 

Recently I have run into the very same performance issues 
as Daniel Meyer (or so it seems). However, I'm not quite sure
about it. Here's the gist of it.

Currently, service check latency slowly creeps up. As it is now,
it starts out at a little over 1s and after about 12 hours it's
in the area of about 90s. It keeps climbing after that. 

Here's the output of nagios -s:
Nagios 2.6
Copyright (c) 1999-2006 Ethan Galstad (http://www.nagios.org)
Last Modified: 11-27-2006
License: GPL

Warning: Contact group 'Singles-Truppe' is not used in any
host/service definitions or host/service escalations!
Projected scheduling information for host and service
checks is listed below.  This information assumes that
you are going to start running Nagios with your current
config files.

HOST SCHEDULING INFORMATION
---------------------------
Total hosts:                     330
Total scheduled hosts:           0
Host inter-check delay method:   SMART
Average host check interval:     0.00 sec
Host inter-check delay:          0.00 sec
Max host check spread:           10 min
First scheduled check:           N/A
Last scheduled check:            N/A


SERVICE SCHEDULING INFORMATION
-------------------------------
Total services:                     2836
Total scheduled services:           2836
Service inter-check delay method:   SMART
Average service check interval:     2225.56 sec
Inter-check delay:                  0.21 sec
Interleave factor method:           SMART
Average services per host:          8.59
Service interleave factor:          9
Max service check spread:           10 min
First scheduled check:              Tue Dec 19 11:21:45 2006
Last scheduled check:               Tue Dec 19 11:31:47 2006


CHECK PROCESSING INFORMATION
----------------------------
Service check reaper interval:      5 sec
Max concurrent service checks:      Unlimited


PERFORMANCE SUGGESTIONS
-----------------------
I have no suggestions - things look okay.

This all looks peachy - I think. What I don't get is this line:

Average service check interval:     2225.56 sec

It seems to me that this is either a skewed value, stemming from
my history of looong latencies (at one point we were beyonf
9000 seconds). *Or* it is indicative of a misconfiguration on my
part. If the latter is the case, I'd be eager, nay ecstatic to
hear what I did wrong. Here are a few of the config vars that
might influence this:

sleep_time=0.25
service_reaper_frequency=5
max_concurrent_checks=0
max_host_check_spread=10
host_inter_check_delay_method=s
service_interleave_factor=s
command_check_interval=1
obsess_over_services=0
aggregate_status_updates=1
status_update_interval=20

Also, here's the output from nagiostats:
Nagios Stats 2.6
Copyright (c) 2003-2005 Ethan Galstad (www.nagios.org)
Last Modified: 11-27-2006
License: GPL

CURRENT STATUS DATA
----------------------------------------------------
Status File:                          /var/nagios/status.dat
Status File Age:                      0d 0h 0m 3s
Status File Version:                  2.6

Program Running Time:                 0d 1h 59m 5s

Total Services:                       2836
Services Checked:                     2836
Services Scheduled:                   2758
Active Service Checks:                2836
Passive Service Checks:               0
Total Service State Change:           0.000 / 12.370 / 0.007 %
Active Service Latency:               0.006 / 10.237 / 0.906 sec
Active Service Execution Time:        0.047 / 10.159 / 0.180 sec
Active Service State Change:          0.000 / 12.370 / 0.007 %
Active Services Last 1/5/15/60 min:   477 / 2678 / 2745 / 2754
Passive Service State Change:         0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min:  0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit:            2814 / 6 / 0 / 16
Services Flapping:                    0
Services In Downtime:                 0

Total Hosts:                          330
Hosts Checked:                        330
Hosts Scheduled:                      0
Active Host Checks:                   330
Passive Host Checks:                  0
Total Host State Change:              0.000 / 0.000 / 0.000 %
Active Host Latency:                  0.000 / 1.000 / 0.888 sec
Active Host Execution Time:           0.030 / 4.059 / 0.112 sec
Active Host State Change:             0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min:      0 / 12 / 12 / 12
Passive Host State Change:            0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min:     0 / 0 / 0 / 0
Hosts Up/Down/Unreach:                329 / 1 / 0
Hosts Flapping:                       0
Hosts In Downtime:                    0

Hardware is a dual-2.8GHz Xeon, 2G RAM and a 100 FDX interface.
LoadAvg is around 1.6, sometimes gets to 1.9. CPUs are both
around 40% idle most of the time. I see about 300 context
switches and 500 interrupts per second. The network load is
neglible, ditto the packet rate.

The way these figures look I don't see a performance problem per
se, but maybe I have overlooked a metric that descirbes the
"usual" bottleneck of installations.

Any help is appreciated.

Regards,
Tobias 

PS: I'll send another mail with my questions regarding scheduling
as they're more general in nature.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list