Problem with some NSCA packets getting corrupted on 64-bit SLES 10
Brian A. Seklecki
lavalamp at spiritual-machines.org
Sat Jan 19 18:31:42 CET 2008
MF:
Show us your ocsp_command and ochp_command mappings. Are you calling a
piped command from checkcommands.cfg or calling an external shell
script?
I guarantee you the comma (",") in results is being mapped into a field
delimiter, which confuses nscad(8).
~~BAS
On Thu, 2008-01-17 at 10:37 -0500, Frost, Mark {PBG} wrote:
> I've recently begun an effort to move our Nagios installation to a
> distributed architecture from a centralized one. I had previous used
> NSCA only for a very few passive checks and it works fine on a 32-bit
> Red Hat AS 3 platform (the centralized server).
>
> In testing on a distributed architecture (which is 64-bit Suse Linux
> Enterprise Server (SLES) 10), I seem to have a problem with NSCA. (Note
> that all Nagios and NSCA binaries and libraries were recompiled on the
> 64-bit platform).
>
> After I broke out all the checks to have 2 separate distributed nodes
> send to a central server, I saw a few messages like this one in the
> nagios.log file:
>
> [1200583727] Warning: Passive check result was received for service '0'
> on host 'HOSTXXX', but the service could not be found!
>
> but only about every 1 out of 10 or maybe 20 results was doing this.
> That is, the rest of the results were being correctly shown as "EXTERNAL
> COMMAND" and all expected NSCA fields came up correctly (hostname,
> service desc, check result, text output).
>
> I started having the "send_nsca" script from the distbributed nodes log
> what they were sending to a file. When I correlate what they're sending
> with what the NSCA daemon thinks it's receiving, the client is still
> sending the correct 4 fields, but it's as if the NSCA daemon is dropping
> the 2nd field (service desc) and replacing it with the check result
> field. So ultimately, it thinks the service name is '0'.
>
> I can't see that this matches a pattern (i.e. always on the same hosts
> or same service checks). All I've seen so far is that it happens
> whether I run NSCA as --single or --daemon. It also happens even if I
> turn off one of the distributed nodes (that is, I can't see it being
> volume related).
>
> I have turned on debugging in the NSCA daemon to see what it thinks it's
> getting and it echoes what the nagios.log shows:
>
> SERVICE CHECK -> Host Name: 'HOSTXXX', Service Description: '0', Return
> Code: '0', Output: ' rta=0.140000 ms)'
>
> Again, maybe only 1 out of 10. Ultimately, this causes the server to
> run an active check as it thinks it never got a result from the
> distbributed node.
>
> I'm still trying to dig deeper, but it seems to me that this is
> increasingly pointing to some issue with 64-bit SLES. Or perhaps some
> variable type in NSCA daemon that's not quite right for 64-bit. It's
> hard to tell with its intermittent nature and the fact that I have yet
> to discover a pattern.
>
> Has anyone seen anything like this before?
>
> Thanks
>
> Mark
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>
>
>
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list