huge performance problems
Hendrik Baecker
b00mer at gmx.net
Mon Jun 27 15:31:03 CEST 2005
Mieden, Rick van der schrieb:
> Thanks for the responses, I tweaked it a bit, but still have a bad
> latency with 174 hosts and 2360 services. )I tuned it down from 540
> sec to 224 seconds. My plugins are fine, they are really fast on
> commandline. I also have noticed that the latency drops to 4 secs if I
> have around 1700 services running. So it looks like Nagios has some
> problems when the amount of services go over 2000 over something like
> that.
>
> I’v read something with the USE_MEMORY_PERFORMANCE_TWEAKS. But even
> that option does not do anything better with the latency. I also have
> read that there are many people who has far more hosts and services
> checks than I have without any performance problems. So I’d love to
> see their nagios.cfg, or would like to know what the trick is.
>
> Regards,
>
> Rick
>
Hi,
nearly the same on our side. Nagios with 1900 Services runs with max.
2-4 seconds Latency. But beware if you want more...
I heard from this people too which have more than 2000 Services but most
of them are doing a kind of distributed monitoring I think.
Regards,
Hendrik
> -----Original Message-----
> *From:* Hendrik Baecker [mailto:b00mer at gmx.net]
> *Sent:* Thursday, June 23, 2005 15:50
> *To:* Mieden, Rick van der
> *Cc:* nagios-users at lists.sourceforge.net
> *Subject:* Re: [Nagios-users] huge performance problems
>
> Hi,
>
> one year ago we have had nearly the same performance Problems too.
>
> It seems that the scheduler of nagios roles over itself if the count
> of services is to big. Therefore we decided to install another nagios
> process with different configs in a different directory. So we
> splitted our nagios like our networks. One Nagios (nagios-1) for
> Network A and another one (nagios-2) for Network B.
>
> So our count of services per nagios instance was decreased and it runs
> so far so good.
>
> All this was under version 1.2.
>
> In the past I posted some questions about our problem but there were
> no good answer on it, so today I just only know that it works for us.
>
> So far for this.
> I hope nobody will geek me when I take your post to describe some
> problems we now have on testing above doing with different instances
> on the same host with nagios 2.02b.
>
> When I fire up my instance "nagios-1" with around 1600 Service Checks
> it runs very fine with nearly no latency.
> But when I fire up the "nagios-2" with around 1850 services this
> instance runs very fast to latencies around 100 seconds.
> When I now stop the first instance the latencies on the second one
> decrease down to < 5 seconds.
>
> Perhaps some of the developer can tell me if I am right in theory that
> (one of) the working thread(s) with the scheduling queue can see the
> other scheduling queue? Are the possibly the same?
>
> I am not a programmer but I can think about following: Starting
> nagios-1 will create the scheduling queue and gives it to RAM. So far
> so good. There it is and the worker runs through it and executes the
> checks.
> I am now afraid that when I start my second nagios process this will
> also create the scheduling queue into the system RAM but that the two
> proceses don't have their own queues... Hope that anybody understand
> what I mean.
>
> Best regards
> Hendrik
>
> Mieden, Rick van der schrieb:
>
> We have heavy performance problems with Nagios. We monitor 174 hosts,
> with 2255 services and an average latency off 400 seconds!!!! Off
> course that’s not exceptable.
>
> I use perl plugins with ssh and snmp plugins. I’v compiled nagios with
> perlcache and embedded-perl enabled. The server is a sparc server with
> 2 x 1.1 Ghz CPU and 1024 RAM. (Solaris 8, latest patch-level)
>
> I played around with all kind of parameters and read the tuning docs
> for nagios.
>
> Below the output of “nagios –s nagios.cfg”:
>
> Nagios 2.0b3
>
> Copyright (c) 1999-2005 Ethan Galstad (www.nagios.org
> <http://www.nagios.org>)
>
> Last Modified: 04-03-2005
>
> License: GPL
>
> Projected scheduling information for host and service
>
> checks is listed below. This information assumes that
>
> you are going to start running Nagios with your current
>
> config files.
>
> HOST SCHEDULING INFORMATION
>
> ---------------------------
>
> Total hosts: 174
>
> Total scheduled hosts: 0
>
> Host inter-check delay method: SMART
>
> Average host check interval: 0.00 sec
>
> Host inter-check delay: 0.00 sec
>
> Max host check spread: 30 min
>
> First scheduled check: N/A
>
> Last scheduled check: N/A
>
> SERVICE SCHEDULING INFORMATION
>
> -------------------------------
>
> Total services: 2255
>
> Total scheduled services: 2255
>
> Service inter-check delay method: SMART
>
> Average service check interval: 222.47 sec
>
> Inter-check delay: 0.10 sec
>
> Interleave factor method: SMART
>
> Average services per host: 12.96
>
> Service interleave factor: 13
>
> Max service check spread: 30 min
>
> First scheduled check: Wed Jun 22 15:05:08 2005
>
> Last scheduled check: Wed Jun 22 15:08:50 2005
>
> CHECK PROCESSING INFORMATION
>
> ----------------------------
>
> Service check reaper interval: 5 sec
>
> Max concurrent service checks: 200
>
> PERFORMANCE SUGGESTIONS
>
> -----------------------
>
> I have no suggestions - things look okay.
>
> And a nagiostat output:
>
> CURRENT STATUS DATA
>
> ----------------------------------------------------
>
> Status File: /usr/local/nagios/var/status.dat
>
> Status File Age: 0d 0h 0m 13s
>
> Status File Version: 2.0b3
>
> Program Running Time: 0d 32h 0m 13s
>
> Total Services: 2255
>
> Services Checked: 2255
>
> Services Scheduled: 2255
>
> Active Service Checks: 2255
>
> Passive Service Checks: 0
>
> Total Service State Change: 0.000 / 5.860 / 0.003 %
>
> *Active Service Latency: 386.526 / 414.446 / 394.100 %*
>
> Active Service Execution Time: 0.062 / 60.349 / 1.428 sec
>
> Active Service State Change: 0.000 / 5.860 / 0.003 %
>
> *Active Services Last 1/5/15/60 min: 155 / 1044 / 2255 / 2255*
>
> Passive Service State Change: 0.000 / 0.000 / 0.000 %
>
> Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
>
> Services Ok/Warn/Unk/Crit: 2242 / 0 / 0 / 13
>
> Services Flapping: 0
>
> Services In Downtime: 0
>
> Total Hosts: 174
>
> Hosts Checked: 174
>
> Hosts Scheduled: 0
>
> Active Host Checks: 174
>
> Passive Host Checks: 0
>
> Total Host State Change: 0.000 / 0.000 / 0.000 %
>
> Active Host Latency: 0.000 / 0.000 / 0.000 %
>
> Active Host Execution Time: 0.137 / 1.109 / 0.582 sec
>
> Active Host State Change: 0.000 / 0.000 / 0.000 %
>
> Active Hosts Last 1/5/15/60 min: 1 / 2 / 2 / 9
>
> Passive Host State Change: 0.000 / 0.000 / 0.000 %
>
> Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
>
> Hosts Up/Down/Unreach: 174 / 0 / 0
>
> Hosts Flapping: 0
>
> Hosts In Downtime: 0
>
> Anybody an idea what went wrong here? There must be something……
>
> Regards,
>
> Rick
>
> ===========================================================
>
> De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
> alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht
> ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender
> direct te informeren door het bericht te retourneren. Hoewel Orange
> maatregelen heeft genomen om virussen in deze email of attachments te
> voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn
> aangezien Orange niet aansprakelijk is voor computervirussen die
> veroorzaakt zijn door deze email.
>
> The information contained in this message may be confidential and is
> intended to be only for the addressee. Should you receive this message
> unintentionally, please do not use the contents herein and notify the
> sender immediately by return e-mail. Although Orange has taken steps
> to ensure that this email and attachments are free from any virus, you
> do need to verify the possibility of their existence as Orange can
> take no responsibility for any computer virus which might be
> transferred by way of this email.
>
> ===========================================================
>
> ===========================================================
>
> De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
> alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht
> ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender
> direct te informeren door het bericht te retourneren. Hoewel Orange
> maatregelen heeft genomen om virussen in deze email of attachments te
> voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn
> aangezien Orange niet aansprakelijk is voor computervirussen die
> veroorzaakt zijn door deze email.
>
> The information contained in this message may be confidential and is
> intended to be only for the addressee. Should you receive this message
> unintentionally, please do not use the contents herein and notify the
> sender immediately by return e-mail. Although Orange has taken steps
> to ensure that this email and attachments are free from any virus, you
> do need to verify the possibility of their existence as Orange can
> take no responsibility for any computer virus which might be
> transferred by way of this email.
>
> ===========================================================
>
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list