<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN"> <HTML> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8"> <META NAME="GENERATOR" CONTENT="GtkHTML/3.6.2"> </HEAD> <BODY> Hi nagios-user list, I don't know how to begin this question, because i can't imagine how much use of the nagios web-interface is made by the people who read this list. But here we use nagios to actively check something around 10k services now, and up to 2300 hosts. Lately we upgrade our monitoring pool of machines, setting up a distributed framework to agregate all warnings at one unique webserver. So far, this new framework is doing its job, but sometimes, we get around 15 people connected to the nagios web-interface, and the status.cgi is taking too much time to load. So here is my question: "Is there any ./configure options, or any set of CFLAGS to improve performance of the cgis?" Here's a snipet from top: Tasks: 135 total, 18 running, 117 sleeping, 0 stopped, 0 zombie Cpu(s): 86.8% us, 12.7% sy, 0.0% ni, 0.2% id, 0.0% wa, 0.2% hi, 0.2% si Mem: 2074356k total, 1450956k used, 623400k free, 170980k buffers Swap: 2104472k total, 0k used, 2104472k free, 1041400k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8509 nagios 19 0 7244 6096 424 R 34.0 0.3 0:01.17 status.cgi 8508 nagios 19 0 14868 12m 8508 R 24.5 0.6 0:01.09 status.cgi 8687 nagios 18 0 12756 7104 4600 R 17.5 0.3 0:00.53 status.cgi 8690 nagios 18 0 12756 7016 4544 R 17.2 0.3 0:00.52 status.cgi 8506 nagios 19 0 14472 11m 7772 R 16.2 0.6 0:01.04 status.cgi 8027 nagios 24 0 22952 20m 11m R 12.2 1.0 0:02.93 status.cgi 8115 nagios 21 0 22956 15m 6816 R 10.6 0.8 0:02.21 status.cgi 8078 nagios 22 0 10412 9348 540 R 10.2 0.5 0:03.30 status.cgi 8103 nagios 22 0 10412 9336 528 R 10.2 0.5 0:03.27 status.cgi 8046 nagios 21 0 10416 9340 524 R 7.6 0.5 0:03.06 status.cgi 7995 nagios 22 0 22956 17m 9420 R 1.3 0.9 0:02.52 status.cgi 15374 nagios 15 0 39780 21m 908 S 1.0 1.0 1:48.06 nagios 15382 nagios 16 0 1672 648 540 S 1.0 0.0 0:10.55 nsca 8072 nagios 20 0 22948 13m 4844 R 1.0 0.7 0:01.91 status.cgi 23767 nagios 20 0 223m 8516 2172 S 0.7 0.4 0:00.52 httpd 23769 nagios 20 0 224m 8272 2172 S 0.3 0.4 0:00.52 httpd 8151 msugano 16 0 2040 1136 828 R 0.3 0.1 0:00.05 top As you can see, lots of instances of the cgis around, consuming about 90% of CPU time. The problem we are experiencing here, it's that we used to monitoring nagios service, by checking a regexp at the tac.cgi, and the thresholds are tight, 6 seconds to warning, 8 seconds to critical and 10seconds to timeout. We've never experienced critical levels of this check, but after putting this interface to agregate all alarms, and having 15~20 people hanged onto nagios interface to see whats happening with the services they operate, we are dealing with high levels of response time from cgis. Finally, the machine that's serving the interface is getting passive messages from the active monitoring agents and is a Pentium4 HT-SMP processor, with 2GB memory, SATA HDD, running SuSE9.3 with kernel 2.6.11-8-SMP. <TABLE CELLSPACING="0" CELLPADDING="0" WIDTH="100%"> <TR> <TD> -- Marcel Mitsuto </TD> </TR> </TABLE> </BODY> </HTML>