Again, thank you for all the quick answers. This list/community is awesome!!! <br><br>I'm already using tmpfs, increased named pipe buffer size, did everything that one is supposed to do in order to increase performance. <br>
<br>I think I'd go with removing sleep calls in the code. I'm at version 3.2.1 and would love to have a look at Max's patch! <br><br>Notification is not my bottleneck, and this is not for my own nagios install, it's for someone else, so I cannot post nagios.cfg here. Sorry. <br>
<br>But again, thanks for all the answers!!! <br><br><div class="gmail_quote">On Tue, May 18, 2010 at 5:49 PM, Mike Lindsey <span dir="ltr"><<a href="mailto:mike-nagios@5dninja.net">mike-nagios@5dninja.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im">Marcel wrote:<br>
> When I have more than, say, 10k checks, I start seen check latency rises<br>
> and there just isn't anything that could be done, even distributed<br>
> monitoring have the nagios.cmd write-lock bottleneck.<br>
<br>
</div>So, I've just gone through this, and the single greatest bottleneck I<br>
had to deal with is notifications. But, I have a lot of people in the<br>
notification tree, and pull in a lot of meta-data to make ticket<br>
tracking and issue resolution easier and faster. Since Nagios needs to<br>
know the exit status of notification commands, it doesn't fork before<br>
notifications.. it just plods along waiting for the notification command<br>
to exit.<br>
<br>
I switched all our non-pager notification commands to drop a spool file<br>
in a directory, letting another process read the spool files, generate<br>
email contents, query ticket databases, pull in documentation or<br>
extended testing information (full mysql processlist output, for dbas..<br>
etc) and caching it for subsequent notifications for that event.<br>
<br>
That showed a HUGE improvement to my master server's performance.<br>
<br>
If notifications aren't your bottleneck, you can move all your temporary<br>
files to ramdisk.<br>
<br>
You can also increase your FIFO pipe size, but that only delays the<br>
issue and doesn't really solve the problem if you're always running hot.<br>
It also probably involves recompiling your kernel.<br>
<br>
If you're using nsca, you can cache your updates for a second or two, so<br>
that multiple updates happen in the same socket connection.<br>
<br>
Alternately (or additionally) you can have nsca update the checkresults<br>
directory, directly, skipping the steps where nagios reads the command<br>
pipe, and then just writes it back out to the checkresults directory.<br>
<br>
I can package up a patch (against 2.7.2) of those last couple changes (I<br>
need to submit them, anyway). If you're manlier than I might be, you<br>
could also consider modifying the core nagios to allow submissions from<br>
distributed nagios servers, directly to a socket, but doing that right<br>
might require serious threaded c foo, and depending on your OS and<br>
threading library, you might be locked to a single core.<br>
<br>
So, you have options. They're not all equal, and aren't all easy. But<br>
you wouldn't be working with monitoring if you didn't like challenges... :)<br>
<br>
--<br>
<font color="#888888">Mike Lindsey<br>
</font><div><div></div><div class="h5"><br>
------------------------------------------------------------------------------<br>
<br>
_______________________________________________<br>
Nagios-users mailing list<br>
<a href="mailto:Nagios-users@lists.sourceforge.net">Nagios-users@lists.sourceforge.net</a><br>
<a href="https://lists.sourceforge.net/lists/listinfo/nagios-users" target="_blank">https://lists.sourceforge.net/lists/listinfo/nagios-users</a><br>
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.<br>
::: Messages without supporting info will risk being sent to /dev/null<br>
</div></div></blockquote></div><br>