huge performance problems
Mieden, Rick van der
rick.vandermieden at orangemail.nl
Mon Jun 27 13:35:40 CEST 2005
Thanks for the responses, I tweaked it a bit, but still have a bad
latency with 174 hosts and 2360 services. )I tuned it down from 540 sec
to 224 seconds. My plugins are fine, they are really fast on
commandline. I also have noticed that the latency drops to 4 secs if I
have around 1700 services running. So it looks like Nagios has some
problems when the amount of services go over 2000 over something like
that.
I'v read something with the USE_MEMORY_PERFORMANCE_TWEAKS. But even that
option does not do anything better with the latency. I also have read
that there are many people who has far more hosts and services checks
than I have without any performance problems. So I'd love to see their
nagios.cfg, or would like to know what the trick is.
Regards,
Rick
-----Original Message-----
From: Hendrik Baecker [mailto:b00mer at gmx.net]
Sent: Thursday, June 23, 2005 15:50
To: Mieden, Rick van der
Cc: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] huge performance problems
Hi,
one year ago we have had nearly the same performance Problems too.
It seems that the scheduler of nagios roles over itself if the count of
services is to big. Therefore we decided to install another nagios
process with different configs in a different directory. So we splitted
our nagios like our networks. One Nagios (nagios-1) for Network A and
another one (nagios-2) for Network B.
So our count of services per nagios instance was decreased and it runs
so far so good.
All this was under version 1.2.
In the past I posted some questions about our problem but there were no
good answer on it, so today I just only know that it works for us.
So far for this.
I hope nobody will geek me when I take your post to describe some
problems we now have on testing above doing with different instances on
the same host with nagios 2.02b.
When I fire up my instance "nagios-1" with around 1600 Service Checks it
runs very fine with nearly no latency.
But when I fire up the "nagios-2" with around 1850 services this
instance runs very fast to latencies around 100 seconds.
When I now stop the first instance the latencies on the second one
decrease down to < 5 seconds.
Perhaps some of the developer can tell me if I am right in theory that
(one of) the working thread(s) with the scheduling queue can see the
other scheduling queue? Are the possibly the same?
I am not a programmer but I can think about following: Starting nagios-1
will create the scheduling queue and gives it to RAM. So far so good.
There it is and the worker runs through it and executes the checks.
I am now afraid that when I start my second nagios process this will
also create the scheduling queue into the system RAM but that the two
proceses don't have their own queues... Hope that anybody understand
what I mean.
Best regards
Hendrik
Mieden, Rick van der schrieb:
We have heavy performance problems with Nagios. We monitor 174 hosts,
with 2255 services and an average latency off 400 seconds!!!! Off course
that's not exceptable.
I use perl plugins with ssh and snmp plugins. I'v compiled nagios with
perlcache and embedded-perl enabled. The server is a sparc server with 2
x 1.1 Ghz CPU and 1024 RAM. (Solaris 8, latest patch-level)
I played around with all kind of parameters and read the tuning docs for
nagios.
Below the output of "nagios -s nagios.cfg":
Nagios 2.0b3
Copyright (c) 1999-2005 Ethan Galstad (www.nagios.org)
Last Modified: 04-03-2005
License: GPL
Projected scheduling information for host and service
checks is listed below. This information assumes that
you are going to start running Nagios with your current
config files.
HOST SCHEDULING INFORMATION
---------------------------
Total hosts: 174
Total scheduled hosts: 0
Host inter-check delay method: SMART
Average host check interval: 0.00 sec
Host inter-check delay: 0.00 sec
Max host check spread: 30 min
First scheduled check: N/A
Last scheduled check: N/A
SERVICE SCHEDULING INFORMATION
-------------------------------
Total services: 2255
Total scheduled services: 2255
Service inter-check delay method: SMART
Average service check interval: 222.47 sec
Inter-check delay: 0.10 sec
Interleave factor method: SMART
Average services per host: 12.96
Service interleave factor: 13
Max service check spread: 30 min
First scheduled check: Wed Jun 22 15:05:08 2005
Last scheduled check: Wed Jun 22 15:08:50 2005
CHECK PROCESSING INFORMATION
----------------------------
Service check reaper interval: 5 sec
Max concurrent service checks: 200
PERFORMANCE SUGGESTIONS
-----------------------
I have no suggestions - things look okay.
And a nagiostat output:
CURRENT STATUS DATA
----------------------------------------------------
Status File: /usr/local/nagios/var/status.dat
Status File Age: 0d 0h 0m 13s
Status File Version: 2.0b3
Program Running Time: 0d 32h 0m 13s
Total Services: 2255
Services Checked: 2255
Services Scheduled: 2255
Active Service Checks: 2255
Passive Service Checks: 0
Total Service State Change: 0.000 / 5.860 / 0.003 %
Active Service Latency: 386.526 / 414.446 / 394.100 %
Active Service Execution Time: 0.062 / 60.349 / 1.428 sec
Active Service State Change: 0.000 / 5.860 / 0.003 %
Active Services Last 1/5/15/60 min: 155 / 1044 / 2255 / 2255
Passive Service State Change: 0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit: 2242 / 0 / 0 / 13
Services Flapping: 0
Services In Downtime: 0
Total Hosts: 174
Hosts Checked: 174
Hosts Scheduled: 0
Active Host Checks: 174
Passive Host Checks: 0
Total Host State Change: 0.000 / 0.000 / 0.000 %
Active Host Latency: 0.000 / 0.000 / 0.000 %
Active Host Execution Time: 0.137 / 1.109 / 0.582 sec
Active Host State Change: 0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min: 1 / 2 / 2 / 9
Passive Host State Change: 0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach: 174 / 0 / 0
Hosts Flapping: 0
Hosts In Downtime: 0
Anybody an idea what went wrong here? There must be something......
Regards,
Rick
===========================================================
De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht
ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender
direct te informeren door het bericht te retourneren. Hoewel Orange
maatregelen heeft genomen om virussen in deze email of attachments te
voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn
aangezien Orange niet aansprakelijk is voor computervirussen die
veroorzaakt zijn door deze email.
The information contained in this message may be confidential and is
intended to be only for the addressee. Should you receive this message
unintentionally, please do not use the contents herein and notify the
sender immediately by return e-mail. Although Orange has taken steps to
ensure that this email and attachments are free from any virus, you do
need to verify the possibility of their existence as Orange can take no
responsibility for any computer virus which might be transferred by way
of this email.
===========================================================
===========================================================
De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. Hoewel Orange maatregelen heeft genomen om virussen in deze email of attachments te voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn aangezien Orange niet aansprakelijk is voor computervirussen die veroorzaakt zijn door deze email.
The information contained in this message may be confidential and is intended to be only for the addressee. Should you receive this message unintentionally, please do not use the contents herein and notify the sender immediately by return e-mail. Although Orange has taken steps to ensure that this email and attachments are free from any virus, you do need to verify the possibility of their existence as Orange can take no responsibility for any computer virus which might be transferred by way of this email.
===========================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20050627/58a69535/attachment.html>
More information about the Users
mailing list