big performance issue with Nagios 2.5
Daniel Meyer
eagle at cyberdelia.de
Mon Dec 18 08:36:04 CET 2006
On Thu, 14 Dec 2006, Marcel Mitsuto Fucatu Sugano wrote:
> What kind of checks do you run? Is there any custom plugins that take
700 * "check_nt"
280 * "check_icmp"
170 * checks against Netapp Filer (Perl und C)
160 * selfmade-checks against RSA Cards
120 * check_snmp_int.pl
80 * checks against VMWare (check_esx2.pl)
20 * checks against Brocade SAN Switches (C)
> too long to finish? Paste here the performance table. Output from
> nagiostats may also be helpful.
I collected the following data with the output of nagiostats:
http://www.cyberdelia.de/nagios-latency.png
Resolution one week, 30 minutes interval. You can see how it runs just
fine for several days (where i worked on the config and restarted nagios
several times per day), and then on dec. 9th (a saturday) the latency
skyrocketed, until i came back into the office on Monday and restarted the
service. The latency dropped instantly.
http://www.cyberdelia.de/nagios-timeframe.png
Checks in the given timeframe (ignore the percentage vertical label). The
more the latency increses the less checks are performed in the 5 minute
timeframe. Quite logical result from the high service check latency.
http://www.cyberdelia.de/nagios-executiontime.png
Service check execution time. Rocksteady. Some 10 or 15 checks which need
10 seconds, but the average execution time is about 1 to 1.5 seconds...
This is regardless of the state of nagios (eg. normal latency or the
extrem high latency).
This is the current nagiostats output:
Nagios Stats 2.6
Copyright (c) 2003-2005 Ethan Galstad (www.nagios.org)
Last Modified: 11-27-2006
License: GPL
CURRENT STATUS DATA
----------------------------------------------------
Status File: /var/log/nagios/status.dat
Status File Age: 0d 0h 0m 4s
Status File Version: 2.6
Program Running Time: 0d 0h 13m 49s
Total Services: 1612
Services Checked: 1602
Services Scheduled: 1602
Active Service Checks: 1612
Passive Service Checks: 0
Total Service State Change: 0.000 / 5.990 / 0.007 %
Active Service Latency: 0.000 / 1.404 / 0.315 sec
Active Service Execution Time: 0.000 / 10.031 / 0.407 sec
Active Service State Change: 0.000 / 5.990 / 0.007 %
Active Services Last 1/5/15/60 min: 315 / 1551 / 1602 / 1602
Passive Service State Change: 0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit: 1562 / 5 / 36 / 9
Services Flapping: 0
Services In Downtime: 0
Total Hosts: 289
Hosts Checked: 279
Hosts Scheduled: 0
Active Host Checks: 289
Passive Host Checks: 0
Total Host State Change: 0.000 / 0.000 / 0.000 %
Active Host Latency: 0.000 / 0.000 / 0.000 sec
Active Host Execution Time: 0.000 / 0.327 / 0.033 sec
Active Host State Change: 0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min: 7 / 25 / 25 / 26
Passive Host State Change: 0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach: 289 / 0 / 0
Hosts Flapping: 0
Hosts In Downtime: 0
(made about 15 minutes after a restart of nagios, it was again way in the
360 second latency when i came into the office this morning)
Danny
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list