We have a similar situation. We restart Nagios daily to minimize scheduling skew (which accounts for the load drops), we also often have to restart Nagios during the middle of the day to accomodate configuration change requests (the mid-day load drops). Interestingly enough there is no correlation between the increased load and increased memory usage at the same time .. RAM utilization stays very stable irregardless of load.<br>
<br>I am at this point blaming myself for something not being right in one or more of the plugins we use and wrote; most of our checks are SNMP-based through perl plugins with ePN with a much smaller number of checks being done via check_nrpe. We do see Nagios zombies over the course of a day (4-6 or so) .. our restart script has to send a SIGKILL to those processes to get them to completely die.<br>
<br>This wasn't happening on our previous box, which was a dual dual-core RHEL 5.2 box, though it was pretty overwhelmed with the 8000 checks and 1500 hosts, so might have been just hidden behind it being so overworked :p.<br>
<br>Current host:<br><br>Nagios version: 3.0.6<br>Dual quad core host, 16 GB RAM, ~1500 hosts with ~ 8100 checks.<br>RHEL 5.1 64-bit<br><br>So it is interesting to see people having a similar issue with load, do the rest of you also NOT see a correlation between memory usage and load or do you see increased RAM utilizaiton?<br>
<br>- Max<br>