[PATCH] nebmods: Stop with the move-nebmod nonsense
Andreas Ericsson
ae at op5.se
Wed Jun 12 16:45:06 CEST 2013
Applied, with a cautionary note to module maintainers on how they can
change their makefiles to make test-from-sources work without pains.
Thanks. I've hated that code (or rather; It's effects on gdb) a long
time now but I was far too lazy to do anything about it.
On 2013-06-05 13:28, Robin Sonefors wrote:
> The explanation for why we did this was thus:
>
>> The problem with loaded modules is that if you overwrite the original
>> file (e.g. using 'mv'), you do not alter the inode of the original
>> file.
>
> Correct.
>
>> Since the original file/module is memory-mapped in some fashion,
>> Nagios will segfault the next time an event broker call is directed to
>> one of the module's callback functions.
>
> Of course it won't. Why not? The first quoted sentence explained it
> perfectly - the same inode still points to the same fil, so putting
> another inode in the original path won't make one bit of difference.
>
> What would cause a segfault, though, is truncating the module file and
> filling it with new content so the addresses of symbols change while the
> inode stays the same. You shouldn't do that.
>
>> This is extremely problematic when it comes to upgrading NEB modules
>> while Nagios is running.
>
> To sum up the above: no, it isn't.
>
> In addition to having a bogus reason for existing in the first place,
> the code itself is problematic: it tends to cause an accumulation of
> weird and nonsensical nebmod files in your var directory over time,
> while also breaking any sort of clever debuginfo scheme employed by most
> linux distros today, because the debuginfo can't be tracked down when we
> invent new, faked names literally only in order to confuse linkers.
>
> Signed-off-by: Robin Sonefors <robin.sonefors at op5.com>
> ---
> base/nebmods.c | 58 ++--------------------------------------------------------
> 1 file changed, 2 insertions(+), 56 deletions(-)
>
> diff --git a/base/nebmods.c b/base/nebmods.c
> index 3d84675..f6489f8 100644
> --- a/base/nebmods.c
> +++ b/base/nebmods.c
> @@ -163,14 +163,10 @@ int neb_load_all_modules(void) {
> }
>
>
> -#ifndef PATH_MAX
> -# define PATH_MAX 4096
> -#endif
> /* load a particular module */
> int neb_load_module(nebmodule *mod) {
> int (*initfunc)(int, char *, void *);
> int *module_version_ptr = NULL;
> - char output_file[PATH_MAX];
> int dest_fd, result = OK;
>
> if(mod == NULL)
> @@ -184,36 +180,8 @@ int neb_load_module(nebmodule *mod) {
> if(mod->should_be_loaded == FALSE || mod->filename == NULL)
> return ERROR;
>
> - /**********
> - Using dlopen() is great, but a real danger as-is. The problem with loaded modules is that if you overwrite the original file (e.g. using 'mv'),
> - you do not alter the inode of the original file. Since the original file/module is memory-mapped in some fashion, Nagios will segfault the next
> - time an event broker call is directed to one of the module's callback functions. This is extremely problematic when it comes to upgrading NEB
> - modules while Nagios is running. A workaround is to (1) 'mv' the original/loaded module file to another name (on the same filesystem)
> - and (2) copy the new module file to the location of the original one (using the original filename). In this scenario, dlopen() will keep referencing
> - the original file/inode for callbacks. This is not an ideal solution. A better one is to delete the module file once it is loaded by dlopen().
> - This prevents other processed from unintentially overwriting the original file, which would cause Nagios to crash. However, if we delete the file
> - before anyone else can muck with it, things should be good. 'lsof' shows that a deleted file is still referenced by the kernel and callback
> - functions continue to work once the module has been loaded. Long story, but this took quite a while to figure out, as there isn't much
> - of anything I could find on the subject other than some sketchy info on similar problems on HP-UX. Hopefully this will save future coders some time.
> - So... the trick is to (1) copy the module to a temp file, (2) dlopen() the temp file, and (3) immediately delete the temp file.
> - ************/
> -
> - /*
> - * open a temp file for copying the module. We use my_fdcopy() so
> - * we re-use the destination file descriptor returned by mkstemp(3),
> - * which we have to close ourselves.
> - */
> - snprintf(output_file, sizeof(output_file) - 1, "%s/nebmodXXXXXX", temp_path);
> - dest_fd = mkstemp(output_file);
> - result = my_fdcopy(mod->filename, output_file, dest_fd);
> - close(dest_fd);
> - if(result == ERROR) {
> - logit(NSLOG_RUNTIME_ERROR, TRUE, "Error: Failed to safely copy module '%s'. The module will not be loaded\n", mod->filename);
> - return ERROR;
> - }
> -
> - /* load the module (use the temp copy we just made) */
> - mod->module_handle = dlopen(output_file, RTLD_NOW | RTLD_GLOBAL);
> + /* load the module */
> + mod->module_handle = dlopen(mod->filename, RTLD_NOW | RTLD_GLOBAL);
> if(mod->module_handle == NULL) {
> logit(NSLOG_RUNTIME_ERROR, TRUE, "Error: Could not load module '%s' -> %s\n", mod->filename, dlerror());
>
> @@ -223,28 +191,6 @@ int neb_load_module(nebmodule *mod) {
> /* mark the module as being loaded */
> mod->is_currently_loaded = TRUE;
>
> - /* delete the temp copy of the module we just created and loaded */
> - /* this will prevent other processes from overwriting the file (using the same inode), which would cause Nagios to crash */
> - /* the kernel will keep the deleted file in memory until we unload it */
> - /* NOTE: This *should* be portable to most Unices, but I've only tested it on Linux */
> - if(unlink(output_file) == -1) {
> - logit(NSLOG_RUNTIME_ERROR, TRUE, "Error: Could not delete temporary file '%s' used for module '%s'. The module will be unloaded: %s\n", output_file, mod->filename, strerror(errno));
> - neb_unload_module(mod, NEBMODULE_FORCE_UNLOAD, NEBMODULE_ERROR_API_VERSION);
> -
> - return ERROR;
> - }
> -
> - /*
> - * now that it's loaded and removed, we create a new file in
> - * its place so debuggers can find the correct symbols properly,
> - * but only if we're supposed to dump core
> - */
> - if(daemon_dumps_core == TRUE) {
> - dest_fd = open(output_file, O_CREAT | O_WRONLY, S_IRWXU | S_IRGRP | S_IROTH);
> - result = my_fdcopy(mod->filename, output_file, dest_fd);
> - mod->dl_file = strdup(output_file);
> - }
> -
> /* find module API version */
> module_version_ptr = (int *)dlsym(mod->module_handle, "__neb_api_version");
>
>
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
More information about the Developers
mailing list