WEB-Interface performance
Marcel Mitsuto Fucatu Sugano
msugano at uolinc.com
Thu Oct 6 01:32:37 CEST 2005
Hi nagios-user list,
I don't know how to begin this question, because i can't imagine how
much use of the nagios web-interface is made by the people who read this
list. But here we use nagios to actively check something around 10k
services now, and up to 2300 hosts. Lately we upgrade our monitoring
pool of machines, setting up a distributed framework to agregate all
warnings at one unique webserver. So far, this new framework is doing
its job, but sometimes, we get around 15 people connected to the nagios
web-interface, and the status.cgi is taking too much time to load. So
here is my question:
"Is there any ./configure options, or any set of CFLAGS to improve
performance of the cgis?" Here's a snipet from top:
Tasks: 135 total, 18 running, 117 sleeping, 0 stopped, 0 zombie
Cpu(s): 86.8% us, 12.7% sy, 0.0% ni, 0.2% id, 0.0% wa, 0.2% hi,
0.2% si
Mem: 2074356k total, 1450956k used, 623400k free, 170980k buffers
Swap: 2104472k total, 0k used, 2104472k free, 1041400k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
8509 nagios 19 0 7244 6096 424 R 34.0 0.3 0:01.17
status.cgi
8508 nagios 19 0 14868 12m 8508 R 24.5 0.6 0:01.09
status.cgi
8687 nagios 18 0 12756 7104 4600 R 17.5 0.3 0:00.53
status.cgi
8690 nagios 18 0 12756 7016 4544 R 17.2 0.3 0:00.52
status.cgi
8506 nagios 19 0 14472 11m 7772 R 16.2 0.6 0:01.04
status.cgi
8027 nagios 24 0 22952 20m 11m R 12.2 1.0 0:02.93
status.cgi
8115 nagios 21 0 22956 15m 6816 R 10.6 0.8 0:02.21
status.cgi
8078 nagios 22 0 10412 9348 540 R 10.2 0.5 0:03.30
status.cgi
8103 nagios 22 0 10412 9336 528 R 10.2 0.5 0:03.27
status.cgi
8046 nagios 21 0 10416 9340 524 R 7.6 0.5 0:03.06
status.cgi
7995 nagios 22 0 22956 17m 9420 R 1.3 0.9 0:02.52
status.cgi
15374 nagios 15 0 39780 21m 908 S 1.0 1.0 1:48.06
nagios
15382 nagios 16 0 1672 648 540 S 1.0 0.0 0:10.55
nsca
8072 nagios 20 0 22948 13m 4844 R 1.0 0.7 0:01.91
status.cgi
23767 nagios 20 0 223m 8516 2172 S 0.7 0.4 0:00.52
httpd
23769 nagios 20 0 224m 8272 2172 S 0.3 0.4 0:00.52
httpd
8151 msugano 16 0 2040 1136 828 R 0.3 0.1 0:00.05
top
As you can see, lots of instances of the cgis around, consuming about
90% of CPU time. The problem we are experiencing here, it's that we used
to monitoring nagios service, by checking a regexp at the tac.cgi, and
the thresholds are tight, 6 seconds to warning, 8 seconds to critical
and 10seconds to timeout. We've never experienced critical levels of
this check, but after putting this interface to agregate all alarms, and
having 15~20 people hanged onto nagios interface to see whats happening
with the services they operate, we are dealing with high levels of
response time from cgis.
Finally, the machine that's serving the interface is getting passive
messages from the active monitoring agents and is a Pentium4 HT-SMP
processor, with 2GB memory, SATA HDD, running SuSE9.3 with kernel
2.6.11-8-SMP.
--
Marcel Mitsuto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20051005/a195fdf3/attachment.html>
More information about the Users
mailing list