Unexplained nagios crashes
Duncan Ferguson
duncan.ferguson at altinity.com
Mon Sep 17 11:48:12 CEST 2007
On 27 Aug 2007, at 12:19, Andreas Ericsson wrote:
>
> My guess would be that it's an off-by-one somewhere in the code that
> only triggers under some very special circumstances. Since it only
> happens at one customer site, something needs to be special about
> that customer.
Finally we think we have worked out what the problem is, after adding
more debug output and waiting for the crash to happen again.
We traced the data corruption back to the portion of code following a
host check from a slave, and that host check was
coreserv5.main.internal;0;|
i.e. no output and no perf data, just a pipe symbol. These check
results did come back very frequently, but didn't always cause the
crash, and seems related to the use of strtok in commands.c when
stripping the data apart. We have patched the customers code and are
keeping a close eye on it (it hasnt crashed again yet), but it seems
as though Ethan has overhauled the area of code in Nagios 3 already.
If anyone wants the patch then please let us know.
Thanks.
Duncs
--
Duncan Ferguson
http://www.altinity.com
Tel: +44 (0)870 787 9243
Fax: +44 (0)845 280 1725
Skype: duncan_j_ferguson
MSN: duncan.ferguson at altinity.com
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
More information about the Developers
mailing list