bug: unlocking an invalid mutex

Geert Hendrickx ghen at telenet.be
Fri Oct 12 20:13:12 CEST 2007


Hi,

I tried to upgrade a Nagios 2.5 system running on NetBSD to Nagios 2.9.
But it seems like a mutex bug has been introduced in Nagios 2.7 (I can
reproduce it with Nagios 2.7 but not with 2.5 and 2.6).

Unlike Linux, NetBSD's pthread implementation is quite unforgiving for
mutex errors, and aborts a running program e.g. when it tries to unlock
an invalid mutex.  This is what is happening with Nagios:

> Nagios 2.9 starting... (PID=17620)
> nagios: Error detected by libpthread: Invalid mutex.
> Detected by file "/cvs/src/3/lib/libpthread/pthread_mutex.c", line 334, function "pthread_mutex_unlock".
> See pthread(3) for information.
> 
> Program received signal SIGABRT, Aborted.
> [Switching to LWP 1]
> 0xbd9e921f in kill () from /usr/lib/libc.so.12
> (gdb) bt
> #0  0xbd9e921f in kill () from /usr/lib/libc.so.12
> #1  0xbdaa6fb6 in pthread__errorfunc () from /usr/lib/libpthread.so.0
> #2  0xbdaa3d4b in pthread_mutex_unlock () from /usr/lib/libpthread.so.0
> #3  0x080a1651 in xsddefault_save_status_data () at ../xdata/xsddefault.c:338
> #4  0x080a10bd in update_all_status_data () at ../common/statusdata.c:93
> #5  0x080544dc in main (argc=2, argv=0xbfbfe8b8, env=0xbfbfe8c4) at nagios.c:665
> #6  0x0805377d in ___start ()
> (gdb)

The problem is probably in this change between Nagios 2.6 and 2.7:

--- xdata/xsddefault.c	2006-05-20 21:39:34.000000000 +0200
+++ xdata/xsddefault.c	2007-01-03 03:50:43.000000000 +0100
@@ -322,6 +331,18 @@
 		return ERROR;
 	        }
 
+	/* get number of items in the check result buffer */
+	pthread_mutex_lock(&service_result_buffer.buffer_lock);
+	used_check_result_buffer_slots=service_result_buffer.items;
+	high_check_result_buffer_slots=service_result_buffer.high;
+	pthread_mutex_unlock(&service_result_buffer.buffer_lock);
+
+	/* get number of items in the command buffer */
+	pthread_mutex_lock(&external_command_buffer.buffer_lock);
+	used_external_command_buffer_slots=external_command_buffer.items;
+	high_external_command_buffer_slots=external_command_buffer.high;
+	pthread_mutex_unlock(&external_command_buffer.buffer_lock);
+
 	/* write version info to status file */
 	fprintf(fp,"########################################\n");
 	fprintf(fp,"#          NAGIOS STATUS FILE\n");


Can this please be looked into?  Do I need to provide more information?

Thanks,

	Geert


PS: please keep me Cc'd.




-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list