Nagios and passive checks

nagios at mm.quex.org nagios at mm.quex.org
Mon Sep 20 02:54:32 CEST 2004


On Mon, Sep 20, 2004 at 10:37:39AM +1200, pshemko wrote:

> I have a Nagios configuration with over a 1000 checks (services). All
> the status information are sent directly to Nagios through it's
> command file.  There are up to 2000 results a minute, up to 20
> processes sends data to Nagios. 
> The problem is that within 10 minutes even up 3000 messages are lost -
> processes that write to the command file (pipe) simply can't write and
> time out. If message can't be sent for 10 minutes to Nagios it's
> discarded. On the other hand sometimes for over an hour there are no
> problems at all.  It's not load-related issue as the load never gets
> higher then 1 - 1.5 (box is dual Xeon 2.8Gz, 2GB RAM). Nagios doesn't
> execute any active checks. 
> 
> Does anyone have a clue why the processes timeout so often?

I think the default buffer size is 4kb.  You could try recompiling
Nagios to use a bigger buffer and that might provide the quickest
fix.  Since you're processing such a large number of frequent checks,
you might also consider writing your own daemon to accept the check
results from the remote clients and queue them locally for delivery.
Have it write the results to the pipe as fast as it can, but pause
when needed.

You could also look at the command_check_interval setting in the
main configuration file, but you might have to fiddle with the
interval_length to get any benefit out of that.


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list