How to reduce a very high latency number
Marc Powell
marc at ena.com
Wed May 17 22:27:13 CEST 2006
> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of Trask
> Sent: Wednesday, May 17, 2006 1:09 PM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] How to reduce a very high latency number
>
> I am still butting up against very high latency issues with my Nagios
> setup. I feel like I must be missing something obvious because it
> doesn't seem like I have so many services that the servers cannot keep
> up.
>
> As can be seen from the data below, the server with the most service
> checks has the highest latency (usually in the neighborhood of 700
> seconds! -- this is pre-production). Is my problem really this
> simple? I have a feeling that is isn't just the number of checks, but
> I cannot figure out why my latency values are so terrible.
>
> Overview of my setup:
>
> There are 4 servers. 3 distributed servers (nag1, nag2, nag3) at 3
> distinct geological locations send all their check information via
> NSCA to a 4th, central server (nag4). The connections between all of
> these servers are very high-bandwidth and are no where near saturated.
> The only unclear spot to me is the effect that our hardware
> VPN/tunnels might have, however the worst performing server (nag2) is
> on the same LAN as the central server (nag4).
>
> Nagios v2.2, latest plugins and NRPE/NSCA as of today. I am running
> embedded perl with perlcache enabled.
>
>
> Number of hosts/services:
> nag1: 43/130
> nag2: 193/1743
> nag3: 78 / 780
> nag4: (central server - active host checks, passive srvc checks)
>
> Performance Info:
>
> nag1:
> Metric Min Max
> Average
> Check Execution Time: 0.00 sec 20.04 sec 0.024
sec
> Check Latency: 0.00 sec 1.01 sec
0.011 sec
> Percent State Change: 0.00 % 17.17 % 0.01%
>
> nag2
> Check Execution Time: 0.00 sec 929.13 sec 1.246
sec
> Check Latency: 0.00 sec 1180.67 sec
560.462 sec
> Percent State Change: 0.00% 55.59% 0.07%
>
> nag3:
> Check Execution Time: 0.00 sec 101.70 sec 0.310
sec
> Check Latency: 0.00 sec 602.57 sec
46.023 sec
> Percent State Change: 0.00% 0.00% 0.00%
My first reaction is to question why some checks are taking >15 minutes
to complete (check execution time) and why you are allowing them to go
that long. I only allow a maximum of 60 seconds for any service check to
execute --
(from nagios.cfg)
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
Some comparable stats from my servers --
PIII 800/512MB 828 Service Checks -
Check Execution Time: 0.13 sec 11.59 sec 7.984 sec
Check Latency: 0.76 sec 15.54 sec 6.583 sec
Percent State Change: 0.00% 6.25% 0.03%
All active checks, load hangs out around 2.
Another box, newer hardware, running nagios + cricket --
2x Dual Core AMD Opteron Processor 275, 2GB RAM, 1260 service checks --
Check Execution Time: 0.04 sec 35.02 sec 6.675 sec
Check Latency: 0.01 sec 38.16 sec 6.692 sec
Percent State Change: 0.00% 9.47% 0.04%
All active checks, load hangs out between 1 and 2.
--
Marc
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list