<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.6.2">
</HEAD>
<BODY>
Hi nagios-user list,<BR>
<BR>
I don't know how to begin this question, because i can't imagine how much use of the nagios web-interface is made by the people who read this list. But here we use nagios to actively check something around 10k services now, and up to 2300 hosts. Lately we upgrade our monitoring pool of machines, setting up a distributed framework to agregate all warnings at one unique webserver. So far, this new framework is doing its job, but sometimes, we get around 15 people connected to the nagios web-interface, and the status.cgi is taking too much time to load. So here is my question: <BR>
"Is there any ./configure options, or any set of CFLAGS to improve performance of the cgis?" Here's a snipet from top:<BR>
<BR>
Tasks: 135 total, 18 running, 117 sleeping, 0 stopped, 0 zombie<BR>
Cpu(s): 86.8% us, 12.7% sy, 0.0% ni, 0.2% id, 0.0% wa, 0.2% hi, 0.2% si<BR>
Mem: 2074356k total, 1450956k used, 623400k free, 170980k buffers<BR>
Swap: 2104472k total, 0k used, 2104472k free, 1041400k cached<BR>
<BR>
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND <BR>
8509 nagios 19 0 7244 6096 424 R 34.0 0.3 0:01.17 status.cgi <BR>
8508 nagios 19 0 14868 12m 8508 R 24.5 0.6 0:01.09 status.cgi <BR>
8687 nagios 18 0 12756 7104 4600 R 17.5 0.3 0:00.53 status.cgi <BR>
8690 nagios 18 0 12756 7016 4544 R 17.2 0.3 0:00.52 status.cgi <BR>
8506 nagios 19 0 14472 11m 7772 R 16.2 0.6 0:01.04 status.cgi <BR>
8027 nagios 24 0 22952 20m 11m R 12.2 1.0 0:02.93 status.cgi <BR>
8115 nagios 21 0 22956 15m 6816 R 10.6 0.8 0:02.21 status.cgi <BR>
8078 nagios 22 0 10412 9348 540 R 10.2 0.5 0:03.30 status.cgi <BR>
8103 nagios 22 0 10412 9336 528 R 10.2 0.5 0:03.27 status.cgi <BR>
8046 nagios 21 0 10416 9340 524 R 7.6 0.5 0:03.06 status.cgi <BR>
7995 nagios 22 0 22956 17m 9420 R 1.3 0.9 0:02.52 status.cgi <BR>
15374 nagios 15 0 39780 21m 908 S 1.0 1.0 1:48.06 nagios <BR>
15382 nagios 16 0 1672 648 540 S 1.0 0.0 0:10.55 nsca <BR>
8072 nagios 20 0 22948 13m 4844 R 1.0 0.7 0:01.91 status.cgi <BR>
23767 nagios 20 0 223m 8516 2172 S 0.7 0.4 0:00.52 httpd <BR>
23769 nagios 20 0 224m 8272 2172 S 0.3 0.4 0:00.52 httpd <BR>
8151 msugano 16 0 2040 1136 828 R 0.3 0.1 0:00.05 top <BR>
<BR>
As you can see, lots of instances of the cgis around, consuming about 90% of CPU time. The problem we are experiencing here, it's that we used to monitoring nagios service, by checking a regexp at the tac.cgi, and the thresholds are tight, 6 seconds to warning, 8 seconds to critical and 10seconds to timeout. We've never experienced critical levels of this check, but after putting this interface to agregate all alarms, and having 15~20 people hanged onto nagios interface to see whats happening with the services they operate, we are dealing with high levels of response time from cgis.<BR>
<BR>
Finally, the machine that's serving the interface is getting passive messages from the active monitoring agents and is a Pentium4 HT-SMP processor, with 2GB memory, SATA HDD, running SuSE9.3 with kernel 2.6.11-8-SMP.<BR>
<TABLE CELLSPACING="0" CELLPADDING="0" WIDTH="100%">
<TR>
<TD>
-- <BR>
Marcel Mitsuto
</TD>
</TR>
</TABLE>
</BODY>
</HTML>