How to reduce a very high latency number
Trask
trasko at gmail.com
Wed May 17 20:09:16 CEST 2006
I am still butting up against very high latency issues with my Nagios
setup. I feel like I must be missing something obvious because it
doesn't seem like I have so many services that the servers cannot keep
up.
As can be seen from the data below, the server with the most service
checks has the highest latency (usually in the neighborhood of 700
seconds! -- this is pre-production). Is my problem really this
simple? I have a feeling that is isn't just the number of checks, but
I cannot figure out why my latency values are so terrible.
Overview of my setup:
There are 4 servers. 3 distributed servers (nag1, nag2, nag3) at 3
distinct geological locations send all their check information via
NSCA to a 4th, central server (nag4). The connections between all of
these servers are very high-bandwidth and are no where near saturated.
The only unclear spot to me is the effect that our hardware
VPN/tunnels might have, however the worst performing server (nag2) is
on the same LAN as the central server (nag4).
Nagios v2.2, latest plugins and NRPE/NSCA as of today. I am running
embedded perl with perlcache enabled.
Number of hosts/services:
nag1: 43/130
nag2: 193/1743
nag3: 78 / 780
nag4: (central server - active host checks, passive srvc checks)
Performance Info:
nag1:
Metric Min Max Average
Check Execution Time: 0.00 sec 20.04 sec 0.024 sec
Check Latency: 0.00 sec 1.01 sec 0.011 sec
Percent State Change: 0.00 % 17.17 % 0.01%
nag2
Check Execution Time: 0.00 sec 929.13 sec 1.246 sec
Check Latency: 0.00 sec 1180.67 sec 560.462 sec
Percent State Change: 0.00% 55.59% 0.07%
nag3:
Check Execution Time: 0.00 sec 101.70 sec 0.310 sec
Check Latency: 0.00 sec 602.57 sec 46.023 sec
Percent State Change: 0.00% 0.00% 0.00%
Machine load numbers:
nag1: load average: 0.05, 0.08, 0.02 / mem: 470 / 512MB physical ; not swapping
nag2: load average: 0.50, 0.61, 0.59 / mem: 330 / 512MB physical ; not swapping
nag3: load average: 0.25, 0.52, 0.56 / mem: 330 / 512MB physical ; not swapping
Machine hardware:
1Us running Fedora Core 4 / P4 2.4GHz / 512MB RAM / 40GB ATA 8MB cache
7200rpm drives
Ok, that is all I can think of off the top of my head. I have
reviewed the performance tuning tuning doc (from here:
http://nagios.sourceforge.net/docs/2_0/tuning.html), but I am open to
trying things again / in a different way. I can list off what I've
done in response to that doc on a point-by-point basis if anyone is
interested.
Thanks for any help -- this latency issue is the last big hurdle
before getting this thing going.
~trask
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list