Possible bug in NSCA
Andreas Ericsson
ae at op5.se
Fri Sep 23 09:08:37 CEST 2005
Chris Wilson wrote:
> Hi all,
>
> I think I may have found a bug in NSCA. I don't know where to report it,
> but the copyright appears to be Ethan Galstad, so I hope someone here
> can help me.
>
> I just dicovered that NSCA on our main nagios server has been spinning
> and eating CPU for the last week. Strace shows this, over and over:
>
>
>>rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0
>>close(4) = -1 EBADF (Bad file descriptor)
>>accept(4, 0, NULL) = -1 EBADF (Bad file descriptor)
>>time([1127431587]) = 1127431587
>>rt_sigaction(SIGPIPE, {0xa21aa0, [], SA_RESTORER, 0x990f48}, {SIG_DFL}, 8) = 0
>>send(5, "<27>Sep 23 00:26:27 nsca[4425]: Network server accept failure (9: Bad file descriptor)", 86, 0 <unfinished ...>
>
>
> lsof shows that fd 4 is not open.
>
> Looking back in the logs, I can see when this started:
>
>
>>Sep 15 23:52:11 dev nsca[4425]: Network server accept failure (10: No child processes)
>>Sep 15 23:52:11 dev nsca[4425]: Network server accept failure (9: Bad file descriptor)
>>Sep 15 23:52:41 dev last message repeated 1299103 times
>
>
> I can't see any other suspicious messages in the logs around that time.
>
> I have no idea what caused the first error (no child processes), but the
> result seems inappropriate. It appears that nsca handles this error as
> follows, in accept_connection():
>
>
>> /* wait for a connection request */
>> while(1){
>> new_sd=accept(sock,0,0);
>> ...
>> }
>>
>> if(new_sd<0){
>> ...
>> syslog(LOG_ERR,"Network server accept failure (%d: %s)",errno,strerror(errno));
>>
>> /* close socket prior to exiting */
>> close(sock);
>> return;
>> }
>
>
> But nsca does not exit: accept_connection is called in an infinite loop,
> and keeps trying to accept() on a socket that's now closed.
>
> This seems to be bad behaviour, but I'm not sure what the correct
> behaviour would be. Any ideas?
After
new_sd = accept(sock, 0, 0)
you should add
if(new_sd == -1 && errno == EBADF) {
sock = setup_socket();
}
Where setup_socket() is an imaginary function that calls socket(),
possibly setsockopt(), bind() and listen(), in that order.
A cleaner solution is to have nsca exit if it can't obtain the socket,
since there's no real reason to think it should be able to obtain one later.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Lead Developer
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list