[PATCH] nebmods: Stop with the move-nebmod nonsense

Robin Sonefors robin.sonefors at op5.com
Wed Jun 5 13:28:29 CEST 2013


The explanation for why we did this was thus:

> The problem with loaded modules is that if you overwrite the original
> file (e.g. using 'mv'), you do not alter the inode of the original
> file.

Correct.

> Since the original file/module is memory-mapped in some fashion,
> Nagios will segfault the next time an event broker call is directed to
> one of the module's callback functions.

Of course it won't. Why not? The first quoted sentence explained it
perfectly - the same inode still points to the same fil, so putting
another inode in the original path won't make one bit of difference.

What would cause a segfault, though, is truncating the module file and
filling it with new content so the addresses of symbols change while the
inode stays the same. You shouldn't do that.

> This is extremely problematic when it comes to upgrading NEB modules
> while Nagios is running.

To sum up the above: no, it isn't.

In addition to having a bogus reason for existing in the first place,
the code itself is problematic: it tends to cause an accumulation of
weird and nonsensical nebmod files in your var directory over time,
while also breaking any sort of clever debuginfo scheme employed by most
linux distros today, because the debuginfo can't be tracked down when we
invent new, faked names literally only in order to confuse linkers.

Signed-off-by: Robin Sonefors <robin.sonefors at op5.com>
---
 base/nebmods.c | 58 ++--------------------------------------------------------
 1 file changed, 2 insertions(+), 56 deletions(-)

diff --git a/base/nebmods.c b/base/nebmods.c
index 3d84675..f6489f8 100644
--- a/base/nebmods.c
+++ b/base/nebmods.c
@@ -163,14 +163,10 @@ int neb_load_all_modules(void) {
 	}
 
 
-#ifndef PATH_MAX
-# define PATH_MAX 4096
-#endif
 /* load a particular module */
 int neb_load_module(nebmodule *mod) {
 	int (*initfunc)(int, char *, void *);
 	int *module_version_ptr = NULL;
-	char output_file[PATH_MAX];
 	int dest_fd, result = OK;
 
 	if(mod == NULL)
@@ -184,36 +180,8 @@ int neb_load_module(nebmodule *mod) {
 	if(mod->should_be_loaded == FALSE || mod->filename == NULL)
 		return ERROR;
 
-	/**********
-	   Using dlopen() is great, but a real danger as-is.  The problem with loaded modules is that if you overwrite the original file (e.g. using 'mv'),
-	   you do not alter the inode of the original file.  Since the original file/module is memory-mapped in some fashion, Nagios will segfault the next
-	   time an event broker call is directed to one of the module's callback functions.  This is extremely problematic when it comes to upgrading NEB
-	   modules while Nagios is running.  A workaround is to (1) 'mv' the original/loaded module file to another name (on the same filesystem)
-	   and (2) copy the new module file to the location of the original one (using the original filename).  In this scenario, dlopen() will keep referencing
-	   the original file/inode for callbacks.  This is not an ideal solution.   A better one is to delete the module file once it is loaded by dlopen().
-	   This prevents other processed from unintentially overwriting the original file, which would cause Nagios to crash.  However, if we delete the file
-	   before anyone else can muck with it, things should be good.  'lsof' shows that a deleted file is still referenced by the kernel and callback
-	   functions continue to work once the module has been loaded.  Long story, but this took quite a while to figure out, as there isn't much
-	   of anything I could find on the subject other than some sketchy info on similar problems on HP-UX.  Hopefully this will save future coders some time.
-	   So... the trick is to (1) copy the module to a temp file, (2) dlopen() the temp file, and (3) immediately delete the temp file.
-	************/
-
-	/*
-	 * open a temp file for copying the module. We use my_fdcopy() so
-	 * we re-use the destination file descriptor returned by mkstemp(3),
-	 * which we have to close ourselves.
-	 */
-	snprintf(output_file, sizeof(output_file) - 1, "%s/nebmodXXXXXX", temp_path);
-	dest_fd = mkstemp(output_file);
-	result = my_fdcopy(mod->filename, output_file, dest_fd);
-	close(dest_fd);
-	if(result == ERROR) {
-		logit(NSLOG_RUNTIME_ERROR, TRUE, "Error: Failed to safely copy module '%s'. The module will not be loaded\n", mod->filename);
-		return ERROR;
-		}
-
-	/* load the module (use the temp copy we just made) */
-	mod->module_handle = dlopen(output_file, RTLD_NOW | RTLD_GLOBAL);
+	/* load the module */
+	mod->module_handle = dlopen(mod->filename, RTLD_NOW | RTLD_GLOBAL);
 	if(mod->module_handle == NULL) {
 		logit(NSLOG_RUNTIME_ERROR, TRUE, "Error: Could not load module '%s' -> %s\n", mod->filename, dlerror());
 
@@ -223,28 +191,6 @@ int neb_load_module(nebmodule *mod) {
 	/* mark the module as being loaded */
 	mod->is_currently_loaded = TRUE;
 
-	/* delete the temp copy of the module we just created and loaded */
-	/* this will prevent other processes from overwriting the file (using the same inode), which would cause Nagios to crash */
-	/* the kernel will keep the deleted file in memory until we unload it */
-	/* NOTE: This *should* be portable to most Unices, but I've only tested it on Linux */
-	if(unlink(output_file) == -1) {
-		logit(NSLOG_RUNTIME_ERROR, TRUE, "Error: Could not delete temporary file '%s' used for module '%s'.  The module will be unloaded: %s\n", output_file, mod->filename, strerror(errno));
-		neb_unload_module(mod, NEBMODULE_FORCE_UNLOAD, NEBMODULE_ERROR_API_VERSION);
-
-		return ERROR;
-		}
-
-	/*
-	 * now that it's loaded and removed, we create a new file in
-	 * its place so debuggers can find the correct symbols properly,
-	 * but only if we're supposed to dump core
-	 */
-	if(daemon_dumps_core == TRUE) {
-		dest_fd = open(output_file, O_CREAT | O_WRONLY, S_IRWXU | S_IRGRP | S_IROTH);
-		result = my_fdcopy(mod->filename, output_file, dest_fd);
-		mod->dl_file = strdup(output_file);
-		}
-
 	/* find module API version */
 	module_version_ptr = (int *)dlsym(mod->module_handle, "__neb_api_version");
 
-- 
1.7.11.7


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j




More information about the Developers mailing list