AW: Strange NRPE / Nagios problem
Thomas.Zimmer at oppenheim.de
Thomas.Zimmer at oppenheim.de
Thu Apr 13 20:11:24 CEST 2006
I had this problem like others, too. I paste som archive-mail concerning
this issue: (but none of them helped with my probs or i didn´t try the
solutions. the problem cleared itself, don´t ask my why), You´ll find more
searching the archives for "timeout", good luck :)
Andreas Ericsson wrote:
PEYRE Julien wrote:
> Hello everybody,
>
> I'm trying to use Nagios in order to survey our databases with custom
> plug-in. On Nagios browser, if I choose a host and I launch "Schedule
> an immediate check of all services on this host", I have all status
> for all services that take value " CHECK_NRPE: Socket Timeout after 10
> seconds".
>
> If I launch an immediate check service by service (one by one), it's
> OK, it functions.
>
> Any idea would be welcome !
>
You're most likely flooding the socket receive buffers in the kernel.
What systems are you seeing this on and how many checks are there to
run? Most systems have an accept(2) queue size of five, so above that
and you're in uncharted territory unless you fiddle with the
receive-buffers directly through fcntl(2) in which case it should be
possible to set it to some quite large value (see check_icmp.c on how to
do this).
Another one:
Thomas.Zimmer at oppenheim.de wrote:
> Hi Andreas,
> Many thanks for the solution of the timout-prob. Do you think the
> modification the socket receive buffers could cause undesireble
> consequences for the system nagios is running on?
The program enhancing the buffers will ofcourse consume more memory. On
some systems this comes from the kernel's pre-allocated chunks which it
is either expensive or impossible to grow. Since it's only one program
and one socket though it shouldn't make much difference.
> Any security-related issues?
Not with a sane implementation which most systems have these days.
Ancient True64 had some problems, as did HPUX and UniCOS. To my
knowledge this has been fixed though (except possibly UniCOS which I
doubt you're running).
And this one
Hello everybody,
I'm trying to use Nagios in order to survey our databases with custom
plug-in.
On Nagios browser, if I choose a host and I launch "Schedule an immediate
check of all services on this host", I have all status for all services that
take value " CHECK_NRPE: Socket Timeout after 10 seconds".
If I launch an immediate check service by service (one by one), it's OK, it
functions.
Any idea would be welcome !
Thanks in advance,
Julien.
Thomas Zimmer
Produktservice & Betrieb
Betrieb & Support
Sal. Oppenheim jr. & Cie., Frankfurt a. Main
Telefon: +49 69 7134 5192
Internet: http://www.oppenheim.de <http://www.oppenheim.de/>
E-Mail: thomas.zimmer at oppenheim.de
-----Ursprüngliche Nachricht-----
Von: nagios-users-admin at lists.sourceforge.net
[mailto:nagios-users-admin at lists.sourceforge.net] Im Auftrag von Larry
Ludlow
Gesendet: Donnerstag, 13. April 2006 19:54
An: nagios-users at lists.sourceforge.net
Betreff: Re: [Nagios-users] Strange NRPE / Nagios problem
Here are some of my configs...
comands.cfg
define command{
command_name check_sun_disk1
command_line $USER1$/check_nrpe -n -H $HOSTADDRESS$ -t 30 -c check_disk1
}
nrpe.cfg
command[check_disk1]=/export/nagios/libexec/check_disk -w 20 -c 10 -p
/dev/vx/dsk/rootvol
service for this particular host
define service {
use check_sun_disk1
service_description /
check_command check_sun_disk1
host_name ########
servicegroups Sun
contact_groups LAdmins
}
I can run this command manually. When nagios performs the check I get a
socket time out...
I am getting very frustrated... I have used nagios for a few years now.. and
this is the 1st time I have ran into this problem...
there are no firewalls, iptables, filtering or anyhting running on these
boxes yet....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20060413/5b6efac6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Zimmer, Thomas (Produktservices und Betrieb).vcf
Type: application/octet-stream
Size: 254 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20060413/5b6efac6/attachment.obj>
More information about the Users
mailing list