Event broker, dlopen(), and segfaults
Ethan Galstad
nagios at nagios.org
Fri Oct 19 17:19:05 CEST 2007
Roy and Andreas -
Thanks for your insight. I found this article about HP-UX libraries and
it seems to indicate that deleting the original file and replacing it
with a new one will prevent a segfault. Simply overwriting the file
will cause a segfault, as the inode doesn't change:
http://www.sap-basis-abap.com/unix/replacing-libraries-on-hp-ux.htm
Hardly ideal. The only real workaround would be to stat() the file to
check to mtime changes before each and every call to a function within
the module. However, the overhead of doing so is too great to make it
a feasible option...
I'll make a note in the docs about this.
Marantz, Roy wrote:
> This is usually caused by updating the contents of the file instead of
> replacing it. i.e. getting a new inode might make this safe.
> You could try write to FILE.new; mv FILE.new FILE to force the new file
> to get a new inode. This might vary by OS or even OS version.
> Roy
>
> -----Original Message-----
> From: nagios-devel-bounces at lists.sourceforge.net
> [mailto:nagios-devel-bounces at lists.sourceforge.net] On Behalf Of Andreas
> Ericsson
> Sent: Friday, October 19, 2007 3:26 AM
> To: nagios at nagios.org; Nagios Developers List
> Subject: Re: [Nagios-devel] Event broker, dlopen(), and segfaults
>
> Ethan Galstad wrote:
>> While doing some debugging of NDOUtils, I've noticed something bad.
>> Event broker modules like ndomod.o will cause Nagios to segfault if
> they
>> are overwritten on the filesystem while they are in use.
>>
>> I assume this is due to the way dlopen() deals with object files. I
> was
>> under the assumption that a complete copy of the module was kept in
>> memory once it was loaded, but perhaps its mmap()'d.
>>
>> The segfault is easily reproducible every time I overwrite ndomod.o
>> while in use. Even if the "new" version of the file doesn't differ
> from
>> the old.
>>
>> Anyone know more details of how this works, or better yet, how to
>> avoid/deal with it?
>>
>
> When a program still has a descriptor to the file, the kernel retains
> the
> diskblocks pointed to until that descriptor is made invalid (ie,
> close()'d).
>
> I just tested this with modules though, and it doesn't work.
>
> Tested locking the file too, and that didn't work either.
>
> Hmm... The only way out I see is to copy the file to a different
> directory
> and loading it from there, but I'm not sure it's worth it. What should
> we
> do when we fail to copy it, fe? Load from the original location? Not
> load
> the module at all? Either way out is wrong, for a certain value of
> right.
>
> For reference, the only bug I found in glibc/BUGS with any connection to
> dlfcn is this one::
>
> Severity: [ *] to [***]
>
> [ **] Closing shared objects in statically linked binaries most of the
> times leads to crashes during the dlopen(). Hard to fix.
>
> Since nagios isn't compiled statically, this doesn't apply, and it
> doesn't
> crash in dlopen(), but rather when running functions in the file.
>
Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
More information about the Developers
mailing list