<HTML><BODY style="word-wrap: break-word; -khtml-nbsp-mode: space; -khtml-line-break: after-white-space; "><BR><DIV><DIV>On 22 Dec 2006, at 01:50, Ethan Galstad wrote:</DIV><BR class="Apple-interchange-newline"><BLOCKQUOTE type="cite"><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Based on the recent thread about hanging Nagios processes, I have<SPAN class="Apple-converted-space"> </SPAN></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">removed the COMMAND_BUFFER_SLOTS and SERVICE_BUFFER_SLOTS definitions<SPAN class="Apple-converted-space"> </SPAN></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">out to config file variables:</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><BR></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><SPAN class="Apple-tab-span" style="white-space:pre"> </SPAN>external_command_buffer_slots=4096</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><SPAN class="Apple-tab-span" style="white-space:pre"> </SPAN>check_result_buffer_slots=4096</DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><BR></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I have also updated nagiostats to report the avail/used number of slots<SPAN class="Apple-converted-space"> </SPAN></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">for graphing in MRTG.<SPAN class="Apple-converted-space"> </SPAN>Could folks try out the latest 2.x CVS code and<SPAN class="Apple-converted-space"> </SPAN></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">give it some testing?</DIV></BLOCKQUOTE></DIV><DIV><BR class="khtml-block-placeholder"></DIV>Ethan,<DIV><BR class="khtml-block-placeholder"></DIV><DIV>Thanks for applying to CVS. Several comments:</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>- external_command_buffer_slots and check_result_buffer_slots only needs to be an int as the circular_buffer struct only uses an int for items</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>- in xsddefault.c, when you print out external_command_buffer.items, I think this is not thread-safe. My thread knowledge is pretty limited, so please correct me if I am wrong. The main nagios process writes the status data via xsddefault_save_status_data, which needs to read the external_command_buffer variable. However, this variable is written to by the command_file_worker_thread. So I think the xsddefault_save_status_data routine needs a thread lock on external_command_buffers before it can read the items data, otherwise there is the potential for corrupt data. Note, there is a cost to that, especially if the status data is being written with aggregate_status_updates = 0.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>- your output to status.dat is different from mine. You are outputting max_external_command_buffer_slots (the value defined in nagios.cfg) and used_external_command_buffer_slots (the current number of items in the buffer). In my patch, I had a different definition: max_command_buffer_items meant the "maximum number of items that has been in the buffer". </DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>(I would prefer used_external_command_buffer_slots be changed to current_external_command_buffer_slots because it more accurately describes "this is the number I have now".)</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>From now on, I'll call it high_external_command_buffer_items, as it can also be the "high water mark of the number of items in the buffer". This is a useful statistic as it tells you what the max_external_command_buffer_slots should be to get no holdups.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Also, it probably makes sense to put the high water mark within the circular_buffer struct.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Please find a patch attached with these changes.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>On my small test system, the used_check_result_buffer_slots is usually 0. When I introduce 1 fake slave (128 results per 10 seconds), used_check_result_buffer fluctuates from 0 to 20s to 30s. Introducing a 2nd fake slave, the high mark moves up to 100s. A 3rd slave moves the high mark to 192.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>If I introduce NDO into the system, I get a large iowait time (in the 80%s), presumably database writes. The status file is not updated as regularly (one instance of 60 seconds between writes), but when it does, then the high_* values jump up to the 200-300s. This is a poorly configured database, so I'm guessing that there are delays due to the main nagios process passing data to the the broker module.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>At the moment with 2 slaves sending 128 packets per 10 seconds, I'm getting high values of 983 for external commands and 1405 for check results.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>I think these recent changes help with seeing if there are bottlenecks at the reading of the command pipe, but I think there are possibly other slow downs further down the chain (which Nagios 3 may aid with).</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Ton</DIV><DIV><DIV><SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><SPAN class="Apple-style-span" style="border-collapse: separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; white-space: normal; widows: 2; word-spacing: 0px; "><DIV><BR class="khtml-block-placeholder"></DIV><DIV><A href="http://www.altinity.com">http://www.altinity.com</A></DIV><DIV>T: +44 (0)870 787 9243</DIV><DIV>F: +44 (0)845 280 1725</DIV><DIV>Skype: tonvoon</DIV></SPAN></SPAN></SPAN></SPAN></SPAN><DIV><BR class="khtml-block-placeholder"></DIV></SPAN></DIV></DIV></BODY></HTML>