[Nagios-users] Re: problems with performance of cgi's
Marcus Hildenbrand
Marcus.Hildenbrand at sap.com
Tue Apr 27 15:35:50 CEST 2004
Hi,
making different performance tests with Nagios 2.0 it seems that this
version performs better on a P4 box than Nagios 1.2. But the
webinterface is still very slow on our large configuration. So I tried
to find out where most of the time is spend and find the following two
sections:
common/objects.c:
line 2803:
/* add a new service to the list in memory */
service *add_service(....
...
...
...
line 3347:
/* add new service to service list, sorted by host name then
service description */
last_service=service_list;
for(temp_service=service_list;temp_service!=NULL;temp_service=temp_service->next){
if(strcmp(new_service->host_name,temp_service->host_name)<0){
new_service->next=temp_service;
if(temp_service==service_list)
service_list=new_service;
else
last_service->next=new_service;
break;
}
else
if(strcmp(new_service->host_name,temp_service->host_name)==0 &&
strcmp(new_service->description,temp_service->description)<0){
new_service->next=temp_service;
if(temp_service==service_list)
service_list=new_service;
else
last_service->next=new_service;
break;
}
else
last_service=temp_service;
}
and in common/statusdata.c
line 376
/* adds a service status entry to the list in memory */
int add_service_status(servicestatus *new_svcstatus){
..
..
..
line 430
/* add new service status to list, sorted by host name then
description */
last_svcstatus=servicestatus_list;
for(temp_svcstatus=servicestatus_list;temp_svcstatus!=NULL;temp_svcstatus=temp_svcstatus->next){
if(strcmp(new_svcstatus->host_name,temp_svcstatus->host_name)<0){
new_svcstatus->next=temp_svcstatus;
if(temp_svcstatus==servicestatus_list)
servicestatus_list=new_svcstatus;
else
last_svcstatus->next=new_svcstatus;
break;
}
else
if(strcmp(new_svcstatus->host_name,temp_svcstatus->host_name)==0 &&
strcmp(new_svcstatus->description,temp_svcstatus->description)<0){
new_svcstatus->next=temp_svcstatus;
if(temp_svcstatus==servicestatus_list)
servicestatus_list=new_svcstatus;
else
last_svcstatus->next=new_svcstatus;
break;
}
else
last_svcstatus=temp_svcstatus;
}
When I comment out these loops then the execution time of the cgis
speeds up from 40 seconds to 4 seconds. Ok, the output shows wrong data
:-). The first loop where the services are read and sorted needs about
20 seconds. The second loop where the service states are read/sorted
consumes about 16 seconds.
As I understand the code of these two loops adds a service or service
state to a list in sorted order by hostname and description. If the
service or state that should be added will only fit to the end of the
list the whole list has to be searched before. This seems to be true the
most of the time and is therefore very time consuming on large
configurations. One workaround seems to be the order of service entries
in the config file. In our configuration all service entries are
normally in sorted order. So every new service will only fit to the end
of the list. The whole list has to be searched before. After changing to
reverse sorted order the part where the services are read needs only 2
instead of 20 seconds. Maybe some similar actions could be done for the
second part where the service states are read.
Every cgi seems to call these two functions until they have read all the
services and their states. I think it should speed up the cgi's very
much if the list is only sorted once after all services and their states
have been read in.
One thing I don't understand is why the services are read and sorted in
the cgi's again. One of the new features in Nagios 2.0 is the cached
object definition file which should hold all the object configuration
data. If the data inside that file is already sorted than there is no
need to resort them again. I checked this by putting a #ifdef NSCORE
around the loop when the services are read and sorted in the function
add_service. The output of the cgi's seems to be ok and that speeds up
the cgi's for 20 seconds.
Unfortunately I'm not a C programmer and don't know how to modify the
code that way. Hopefully this is not a great modification and it speeds
up Nagios as I expect.
Another useful configuration for large installations is the definition
of USE_MEMORY_PERFORMANCE_TWEAKS in include/config.h.in before running
configure. Without that switch the scheduler will get overloaded and the
check latency will grow dramatically. We are running Nagios with that
switch enabled for a long time without problems. Maybe this could be
added to the documentation or added as a configure option.
Thanks and Best Regards
Marcus Hildenbrand
Stanley Hopcroft wrote:
> Dear Sir,
>
> I am writing to thank you for your letter and say,
>
> On Fri, Feb 13, 2004 at 10:26:02AM +0100, Marcus Hildenbrand wrote:
> > Hi,
> >
> > we are currently monitoring 2100 Hosts with 9900 active service checks
> > with Nagios 1.2. The main problem of that large number of monitored
> > hosts are the cgi's. Most of the cgi's need more than 30 seconds to
> > load. The current installation is running on a server with 4x700 MHz
> > Pentium 3 CPU's with 4 GB RAM running SuSE SLES 7. The check latency is
> > normally under 2 seconds and the cpu idle time is about 33%. So the
> > scheduling of the active service checks and the overall CPU performance
> > seems to be no problem.
>
> One stupid suggestion is that if you have hacking/coding resources you
> might want to have the CGIs deliver gzipped output; this may be doable
> by Apache or an Apache module.
>
> The ntop project does this (not with Apache) and the performance is very
> crisp, even on very underpowered ntop hosts.
>
> > Will the cgi's be faster in Nagios 2.0 for large configurations?
> >
>
> This change in 2.0 is aimed (IIRC) at boosting performance
>
> '3. Daniel Drown's chained hash patch for object search functions'
>
> - replacing linked list searches for objects with hash lookups.
>
> If I understand corectly, this is already in the 2.0 alpha so you might
> give it a pop on your P4 box.
>
> .. snip ..
>
> > Any hints how to solve this problems,
>
> Apart from the tuning info in the docs
> (http://you/nagios/docs/tuning.html).
>
> There have been a few letters about the performance of enormous Nag
> installations (mainly about check latency IIRC); you may find that the
> archives have something to offer.
>
> >
> > Many thanks
> > Marcus
> >
>
> --
> ------------------------------------------------------------------------
> Stanley Hopcroft
> ------------------------------------------------------------------------
>
> '...No man is an island, entire of itself; every man is a piece of the
> continent, a part of the main. If a clod be washed away by the sea,
> Europe is the less, as well as if a promontory were, as well as if a
> manor of thy friend's or of thine own were. Any man's death diminishes
> me, because I am involved in mankind; and therefore never send to know
> for whom the bell tolls; it tolls for thee...'
>
> from Meditation 17, J Donne.
>
>
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> <http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click>
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
More information about the Users
mailing list