Failsafeness and -Wall fixes
Andreas Ericsson
ae at op5.se
Tue Apr 5 00:28:14 CEST 2005
I've been trying to locate the SIGSEGV crash I reported earlier for
quite some time now, and have finally resorted to an interim "solution".
Attached is a patch to make nagios failsafe, insofar as such thing goes,
by forking a grandmaster instance that just waits for the child to exit
in any way. SIGTERM and SIGHUP are forwarded from the grandmaster to the
child, so things should work normally on the inside. Incidentally,
SIGTERM also causes the grandmaster to die after kill(2)'ing the child,
so it's possible to stop the daemon as well. ;)
On the downside, the failsafeness is implemented after the config is
read, so it's now impossible to reload the configuration through sending
a hup. This was necessary to reduce overhead and minimize .
Included in the patch are some minor fixes for the stricter of the
compiler warnings.
The crash is possibly reproducable using the plugin check_rand, which
will basically wreak havoc on nagios' exception handling in a fairly
disorderly manner. If nothing else it can be used to stresstest the core
code, or to get funny email (fortune was kind enough to provide one-liners).
check_rand will display random exit-messages roughly half the times it
exits, and will produces truly pseudo-random exit-codes. Don't generate
any cryptographic keys on the system you're using it on though, as it
leeches its chaos from /dev/urandom.
You can download check_rand from
http://oss.op5.se/nagios/check_rand.tar.gz. Read the code if you're
interested in how it works, but don't ask me to port it.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Lead Developer
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: nagios-failsafe-and-fixes.diff
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20050405/217d4448/attachment.ksh>
More information about the Developers
mailing list