<font size=2 face="sans-serif">Thank you for the advise, but due some
problems in the past, I already have the mysql database in another machine
with 2 cpus and 2GB of ram. </font>
<br>
<br><font size=2 face="sans-serif">Also, because of the problems I suffered,
I have a script that every nigth optimizes and repairs the ndoutils database.
My goal now is to change the engine from MyISAM to INNODB and apply some
tunnig to the database. The engine change is because when problems
start, with MyISAM I have to truncate the database because optimize hangs
out, but with InnoDB, in the tests I've made, works fine.</font>
<br>
<br><font size=2 face="sans-serif">Javi</font>
<br>
<br>
<br>
<br><font size=1 color=#5f5f5f face="sans-serif">De:
</font><font size=1 face="sans-serif">Mike Guthrie <mguthrie@nagios.com></font>
<br><font size=1 color=#5f5f5f face="sans-serif">Para:
</font><font size=1 face="sans-serif">Nagios Users List <nagios-users@lists.sourceforge.net></font>
<br><font size=1 color=#5f5f5f face="sans-serif">Fecha:
</font><font size=1 face="sans-serif">11/10/2011 16:39</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Asunto:
</font><font size=1 face="sans-serif">Re: [Nagios-users]
High check latency in a machine with low load</font>
<br>
<hr noshade>
<br>
<br>
<br><tt><font size=2>If ndoutils starts to create a heavy burden on the
system you can also <br>
offload ndoutils/mysql to a second machine. We wrote the below document
<br>
for Nagios XI, but the doc has the info you'd need to make it work for
<br>
Nagios Core as well. <br>
<br>
</font></tt><a href="http://library.nagios.com/library/products/nagiosxi/documentation/462-offloading-mysql-to-remote-server"><tt><font size=2>http://library.nagios.com/library/products/nagiosxi/documentation/462-offloading-mysql-to-remote-server</font></tt></a><tt><font size=2><br>
<br>
<br>
<br>
Javier Vela Diago wrote:<br>
> I have a lot of custom checks, written mostly in perl, bash and some
<br>
> in python. And some take a lo of time.<br>
><br>
> Nevermind, I think I found the solution, or at least one part. I <br>
> configured to 1 the enable_large_instalallation_tweaks. This options,
<br>
> 6 months ago, almost crashed my system, so i discarded it. Now, with
<br>
> bigger problems, is the last thing that I wanted to test, but finally
<br>
> this afternoon I tested it.<br>
><br>
> When I restarted Nagios, the load has started to grow until 6-8, and
<br>
> the latency problems dissapeared. I was sceptical about the utility
of <br>
> this options but when the load changes form 2,5 to 6, it means that
<br>
> the machine is doing a lot of work that before wasn't doing.<br>
><br>
> Now the problem is that NDOUtils is causing some latency because
of <br>
> MYSQL, but well, at least I know what to optimize. Some tips will
be <br>
> apreciated :)<br>
><br>
> Thank you and sorry for your time.<br>
><br>
><br>
> De: Daniel Wittenberg <daniel.wittenberg.r0ko@statefarm.com><br>
> Para: Nagios Users List <nagios-users@lists.sourceforge.net><br>
> Fecha: 11/10/2011 16:02<br>
> Asunto: Re: [Nagios-users] High check latency
in a machine with <br>
> low load<br>
> ------------------------------------------------------------------------<br>
><br>
><br>
><br>
> I think you have the enable_high_latency option enabled J j/k<br>
> <br>
> Do you have any particular checks that are taking a long time? i.e.
<br>
> can you watch top and see checks taking a while?<br>
> <br>
> Dan<br>
> <br>
> <br>
> *From:* Javier Vela Diago [</font></tt><a href=mailto:jvela@s2grupo.es><tt><font size=2>mailto:jvela@s2grupo.es</font></tt></a><tt><font size=2>]
*<br>
> Sent:* Tuesday, October 11, 2011 6:23 AM*<br>
> To:* nagios-users@lists.sourceforge.net*<br>
> Subject:* [Nagios-users] High check latency in a machine with low
load<br>
> <br>
> Hi,<br>
><br>
> I have a Nagios 3.2.3 deployment with 1000+ Hosts and 3000+ services.
<br>
> This Nagios runs together with NDO and PNP (in bulk mode) in a server
<br>
> with 4GB of Ram and 4 cpus.<br>
><br>
> One day I realized that the check delay in the performance CGI was
<br>
> very high (300-400 seconds). It was very strange so I took the tunning
<br>
> guide form nagios <br>
> (_http://nagios.sourceforge.net/docs/3_0/tuning.html_) and applied
all <br>
> the points I could. In particular I adjusted the max_concurrent_checks
<br>
> to zero (no limit):<br>
><br>
> max_concurrent_checks=0<br>
><br>
> The reaper event:<br>
><br>
> service_reaper_frequency=5<br>
> max_check_result_reaper_time=15<br>
><br>
> and checked that the host checks where not forced. In addition I <br>
> configured 15 seconds of host check cache.<br>
><br>
> cached_host_check_horizon=15<br>
><br>
> But the problem remains. And the load of the server is not very high.
<br>
> Load of 2,5, 2 GB of free memory and an average utilization of disc
of <br>
> 7%. I disabled NDO and PNP but it was useless. After the first round
<br>
> of checks, the delay returns, while the load of the server doesn't
grow.<br>
><br>
> I have searched in google but all the problems area because of the
<br>
> load in the server, but here this is not the main problem. So my <br>
> question is ¿what can I do now?¿There is some variable that shows
me <br>
> where to look? I'm a bit lost right now and I don't know how to find
<br>
> the problem.<br>
><br>
> ¿Or maybe the only way is to configure a master-slave nagios in order
<br>
> to maximize the server utilization?<br>
><br>
> In addition, I have pretty big timeouts (60 seconds) because of the
<br>
> high latency on the network. All your help is appreciated. Thank you
<br>
> in advance.<br>
> *<br>
> nagiostats*<br>
> Nagios Stats 3.2.3<br>
> Copyright (c) 2003-2008 Ethan Galstad (_www.nagios.org_)<br>
> Last Modified: 10-03-2010<br>
> License: GPL<br>
><br>
> CURRENT STATUS DATA<br>
> ------------------------------------------------------<br>
> Status File:
<br>
> /usr/local/argos/aplicaciones/nagios/var/status.dat<br>
> Status File Age:
0d 0h 0m 11s<br>
> Status File Version:
3.2.3<br>
><br>
> Program Running Time:
0d 20h 56m 7s<br>
> Nagios PID:
21834<br>
> Used/High/Total Command Buffers: 0 / 0
/ 4096<br>
><br>
> Total Services:
4032<br>
> Services Checked:
4032<br>
> Services Scheduled:
4030<br>
> Services Actively Checked:
4032<br>
> Services Passively Checked:
0<br>
> Total Service State Change:
0.000 / 37.300 / 0.163 %<br>
> Active Service Latency:
32.876 / 442.138 / 415.816 sec<br>
> Active Service Execution Time: 0.051
/ 60.097 / 1.545 sec<br>
> Active Service State Change: 0.000
/ 37.300 / 0.163 %<br>
> Active Services Last 1/5/15/60 min: 237 / 1530 / 4020
/ 4020<br>
> Passive Service Latency:
0.000 / 0.000 / 0.000 sec<br>
> Passive Service State Change: 0.000
/ 0.000 / 0.000 %<br>
> Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0<br>
> Services Ok/Warn/Unk/Crit:
3766 / 38 / 44 / 184<br>
> Services Flapping:
0<br>
> Services In Downtime:
0<br>
><br>
> Total Hosts:
931<br>
> Hosts Checked:
931<br>
> Hosts Scheduled:
931<br>
> Hosts Actively Checked:
931<br>
> Host Passively Checked:
0<br>
> Total Host State Change:
0.000 / 12.370 / 0.077 %<br>
> Active Host Latency:
0.000 / 441.308 / 416.063 sec<br>
> Active Host Execution Time:
0.062 / 10.113 / 0.395 sec<br>
> Active Host State Change:
0.000 / 12.370 / 0.077 %<br>
> Active Hosts Last 1/5/15/60 min: 74 / 423
/ 931 / 931<br>
> Passive Host Latency:
0.000 / 0.000 / 0.000 sec<br>
> Passive Host State Change:
0.000 / 0.000 / 0.000 %<br>
> Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 /
0<br>
> Hosts Up/Down/Unreach:
897 / 24 / 10<br>
> Hosts Flapping:
0<br>
> Hosts In Downtime:
1<br>
><br>
> Active Host Checks Last 1/5/15 min: 109 / 535 / 1583<br>
> Scheduled:
87 / 433 / 1300<br>
> On-demand:
22 / 102 / 283<br>
> Parallel:
87 / 438 / 1323<br>
> Serial:
0 / 0 / 0<br>
> Cached:
22 / 97 / 260<br>
> Passive Host Checks Last 1/5/15 min: 0 / 0 / 0<br>
> Active Service Checks Last 1/5/15 min: 304 / 1605 / 4924<br>
> Scheduled:
304 / 1605 / 4923<br>
> On-demand:
0 / 0 / 1<br>
> Cached:
0 / 0 / 0<br>
> Passive Service Checks Last 1/5/15 min: 0 / 0 / 0<br>
><br>
> External Commands Last 1/5/15 min: 0 / 0 / 0<br>
> *<br>
> nagios -s*<br>
><br>
> Nagios Core 3.2.3<br>
> Copyright (c) 2009-2010 Nagios Core Development Team and Community
<br>
> Contributors<br>
> Copyright (c) 1999-2009 Ethan Galstad<br>
> Last Modified: 10-03-2010<br>
> License: GPL<br>
><br>
> Website: _http://www.nagios.org_ <</font></tt><a href=http://www.nagios.org/><tt><font size=2>http://www.nagios.org/</font></tt></a><tt><font size=2>><br>
> Warning: aggregate_status_updates directive ignored. All status
file <br>
> updates are now aggregated.<br>
> Warning: downtime_file variable ignored. Downtime entries are
now <br>
> stored in the status and retention files.<br>
> Warning: comment_file variable ignored. Comments are now stored
in <br>
> the status and retention files.<br>
> Timing information on object configuration processing is listed<br>
> below. You can use this information to see if precaching your<br>
> object configuration would be useful.<br>
><br>
> Object Config Source: Config files (uncached)<br>
><br>
> OBJECT CONFIG PROCESSING TIMES (* = Potential
for precache <br>
> savings with -u option)<br>
> ----------------------------------<br>
> Read: 0.080036
sec<br>
> Resolve: 0.010660
sec *<br>
> Recomb Contactgroups: 0.002666 sec *<br>
> Recomb Hostgroups: 0.004086 sec *<br>
> Dup Services: 0.034632 sec *<br>
> Recomb Servicegroups: 0.001277 sec *<br>
> Duplicate: 0.010939 sec *<br>
> Inherit: 0.005594
sec *<br>
> Recomb Contacts: 0.000001 sec *<br>
> Sort: 0.000000
sec *<br>
> Register: 0.074413 sec<br>
> Free: 0.008730
sec<br>
>
============<br>
> TOTAL: 0.234920
sec * = 0.071741 sec (30.54%) <br>
> estimated savings<br>
><br>
><br>
> RETENTION DATA TIMES<br>
> ----------------------------------<br>
> Read and Process: 0.495480 sec<br>
>
============<br>
> TOTAL: 0.495480
sec<br>
><br>
><br>
> Timing information on configuration verification is listed below.<br>
><br>
> CONFIG VERIFICATION TIMES (* = Potential
for speedup with -x <br>
> option)<br>
> ----------------------------------<br>
> Object Relationships: 0.060039 sec<br>
> Circular Paths: 0.026557 sec *<br>
> Misc: 0.005999
sec<br>
>
============<br>
> TOTAL: 0.092595
sec * = 0.026557 sec (28.7%) estimated <br>
> savings<br>
><br>
><br>
> EVENT SCHEDULING TIMES<br>
> -------------------------------------<br>
> Get service info: 0.014509 sec<br>
> Get host info info: 0.002853 sec<br>
> Get service params: 0.000078 sec<br>
> Schedule service times: 0.039947 sec<br>
> Schedule service events: 0.034656 sec<br>
> Get host params: 0.000001 sec<br>
> Schedule host times: 0.007519 sec<br>
> Schedule host events: 0.029519 sec<br>
>
============<br>
> TOTAL:
0.129082 sec<br>
><br>
><br>
> Projected scheduling information for host and service checks<br>
> is listed below. This information assumes that you are going<br>
> to start running Nagios with your current config files.<br>
><br>
> HOST SCHEDULING INFORMATION<br>
> ---------------------------<br>
> Total hosts:
931<br>
> Total scheduled hosts: 931<br>
> Host inter-check delay method: SMART<br>
> Average host check interval: 259.01 sec<br>
> Host inter-check delay: 0.28 sec<br>
> Max host check spread: 30 min<br>
> First scheduled check: Tue Oct
11 13:14:08 2011<br>
> Last scheduled check: Tue
Oct 11 13:18:26 2011<br>
><br>
><br>
> SERVICE SCHEDULING INFORMATION<br>
> -------------------------------<br>
> Total services:
4032<br>
> Total scheduled services: 4030<br>
> Service inter-check delay method: SMART<br>
> Average service check interval: 299.55 sec<br>
> Inter-check delay:
0.07 sec<br>
> Interleave factor method: SMART<br>
> Average services per host: 4.33<br>
> Service interleave factor: 5<br>
> Max service check spread: 30 min<br>
> First scheduled check: Tue
Oct 11 13:15:07 2011<br>
> Last scheduled check:
Tue Oct 11 13:20:07 2011<br>
><br>
><br>
> CHECK PROCESSING INFORMATION<br>
> ----------------------------<br>
> Check result reaper interval: 5 sec<br>
> Max concurrent service checks: Unlimited<br>
><br>
><br>
> PERFORMANCE SUGGESTIONS<br>
> -----------------------<br>
> I have no suggestions - things look okay.<br>
> -- <br>
> Javier Vela Diago<br>
> S2 GRUPO<br>
> Ramiro de Maeztu, 7 bajo. 46022 Valencia<br>
> Tel: 963.110.300 Fax: 963.106.086<br>
> e-mail : jvela arroba s2grupo punto es_<br>
> __http://www.s2grupo.es_ <br>
> <</font></tt><a href=http://www.s2grupo.es/><tt><font size=2>http://www.s2grupo.es/</font></tt></a><tt><font size=2>>------------------------------------------------------------------------------<br>
> All the data continuously generated in your IT infrastructure contains
a<br>
> definitive record of customers, application performance, security<br>
> threats, fraudulent activity and more. Splunk takes this data and
makes<br>
> sense of it. Business sense. IT sense. Common sense.<br>
> </font></tt><a href="http://p.sf.net/sfu/splunk-d2d-oct_______________________________________________"><tt><font size=2>http://p.sf.net/sfu/splunk-d2d-oct_______________________________________________</font></tt></a><tt><font size=2><br>
> Nagios-users mailing list<br>
> Nagios-users@lists.sourceforge.net<br>
> </font></tt><a href="https://lists.sourceforge.net/lists/listinfo/nagios-users"><tt><font size=2>https://lists.sourceforge.net/lists/listinfo/nagios-users</font></tt></a><tt><font size=2><br>
> ::: Please include Nagios version, plugin version (-v) and OS when
<br>
> reporting any issue.<br>
> ::: Messages without supporting info will risk being sent to /dev/null<br>
> ------------------------------------------------------------------------<br>
><br>
> ------------------------------------------------------------------------------<br>
> All the data continuously generated in your IT infrastructure contains
a<br>
> definitive record of customers, application performance, security<br>
> threats, fraudulent activity and more. Splunk takes this data and
makes<br>
> sense of it. Business sense. IT sense. Common sense.<br>
> </font></tt><a href="http://p.sf.net/sfu/splunk-d2d-oct"><tt><font size=2>http://p.sf.net/sfu/splunk-d2d-oct</font></tt></a><tt><font size=2><br>
> ------------------------------------------------------------------------<br>
><br>
> _______________________________________________<br>
> Nagios-users mailing list<br>
> Nagios-users@lists.sourceforge.net<br>
> </font></tt><a href="https://lists.sourceforge.net/lists/listinfo/nagios-users"><tt><font size=2>https://lists.sourceforge.net/lists/listinfo/nagios-users</font></tt></a><tt><font size=2><br>
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. <br>
> ::: Messages without supporting info will risk being sent to /dev/null<br>
<br>
<br>
-- <br>
<br>
<br>
Mike Guthrie<br>
Technical Team<br>
___<br>
Nagios Enterprises, LLC<br>
Email: mguthrie@nagios.com<br>
Web: </font></tt><a href=www.nagios.com><tt><font size=2>www.nagios.com</font></tt></a><tt><font size=2><br>
<br>
<br>
------------------------------------------------------------------------------<br>
All the data continuously generated in your IT infrastructure contains
a<br>
definitive record of customers, application performance, security<br>
threats, fraudulent activity and more. Splunk takes this data and makes<br>
sense of it. Business sense. IT sense. Common sense.<br>
</font></tt><a href="http://p.sf.net/sfu/splunk-d2d-oct"><tt><font size=2>http://p.sf.net/sfu/splunk-d2d-oct</font></tt></a><tt><font size=2><br>
_______________________________________________<br>
Nagios-users mailing list<br>
Nagios-users@lists.sourceforge.net<br>
</font></tt><a href="https://lists.sourceforge.net/lists/listinfo/nagios-users"><tt><font size=2>https://lists.sourceforge.net/lists/listinfo/nagios-users</font></tt></a><tt><font size=2><br>
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue. <br>
::: Messages without supporting info will risk being sent to /dev/null<br>
</font></tt>
<br>