bug: unlocking an invalid mutex
Andreas Ericsson
ae at op5.se
Sun Oct 14 17:38:09 CEST 2007
Ethan Galstad wrote:
> Andreas Ericsson wrote:
>> Geert Hendrickx wrote:
>>> Hi,
>>>
>>> I tried to upgrade a Nagios 2.5 system running on NetBSD to Nagios 2.9.
>>> But it seems like a mutex bug has been introduced in Nagios 2.7 (I can
>>> reproduce it with Nagios 2.7 but not with 2.5 and 2.6).
>>>
>>> Unlike Linux, NetBSD's pthread implementation is quite unforgiving for
>>> mutex errors, and aborts a running program e.g. when it tries to unlock
>>> an invalid mutex. This is what is happening with Nagios:
>>>
>>>> Nagios 2.9 starting... (PID=17620)
>>>> nagios: Error detected by libpthread: Invalid mutex.
>>>> Detected by file "/cvs/src/3/lib/libpthread/pthread_mutex.c", line 334, function "pthread_mutex_unlock".
>>>> See pthread(3) for information.
>>>>
>>>> Program received signal SIGABRT, Aborted.
>>>> [Switching to LWP 1]
>>>> 0xbd9e921f in kill () from /usr/lib/libc.so.12
>>>> (gdb) bt
>>>> #0 0xbd9e921f in kill () from /usr/lib/libc.so.12
>>>> #1 0xbdaa6fb6 in pthread__errorfunc () from /usr/lib/libpthread.so.0
>>>> #2 0xbdaa3d4b in pthread_mutex_unlock () from /usr/lib/libpthread.so.0
>>>> #3 0x080a1651 in xsddefault_save_status_data () at ../xdata/xsddefault.c:338
>>>> #4 0x080a10bd in update_all_status_data () at ../common/statusdata.c:93
>>>> #5 0x080544dc in main (argc=2, argv=0xbfbfe8b8, env=0xbfbfe8c4) at nagios.c:665
>>>> #6 0x0805377d in ___start ()
>>>> (gdb)
>>> The problem is probably in this change between Nagios 2.6 and 2.7:
>>>
>>> --- xdata/xsddefault.c 2006-05-20 21:39:34.000000000 +0200
>>> +++ xdata/xsddefault.c 2007-01-03 03:50:43.000000000 +0100
>>> @@ -322,6 +331,18 @@
>>> return ERROR;
>>> }
>>>
>>> + /* get number of items in the check result buffer */
>>> + pthread_mutex_lock(&service_result_buffer.buffer_lock);
>>> + used_check_result_buffer_slots=service_result_buffer.items;
>>> + high_check_result_buffer_slots=service_result_buffer.high;
>>> + pthread_mutex_unlock(&service_result_buffer.buffer_lock);
>>> +
>>> + /* get number of items in the command buffer */
>>> + pthread_mutex_lock(&external_command_buffer.buffer_lock);
>>> + used_external_command_buffer_slots=external_command_buffer.items;
>>> + high_external_command_buffer_slots=external_command_buffer.high;
>>> + pthread_mutex_unlock(&external_command_buffer.buffer_lock);
>>> +
>>> /* write version info to status file */
>>> fprintf(fp,"########################################\n");
>>> fprintf(fp,"# NAGIOS STATUS FILE\n");
>>>
>>>
>>> Can this please be looked into? Do I need to provide more information?
>>>
>> I suppose just checking for success from the pthread_mutex_lock() calls would
>> be enough, and letting it spinlock for 10 tries if it fails. If it *always* fails,
>> that would be quite horrible though, as it would mean something fairly illegal is
>> going on in there.
>>
>> I'll whip up a patch for it once I'm done with what I'm currently fiddling with.
>>
>
> It looks like the error is occurring in the pthread_mutex_unlock()
> function, which is strange. Checking Google resulted in a couple of
> hits that make it sound like a problem in NetBSD's pthread implementation.
>
I'm not so sure. pthread_mutex_lock() can actually fail, but it's not defined
what happens with the lock when that happens, or how the mutex is used within
the threading library.
> Does the error still occur if you set "PTHREAD_DIAGASSERT='A'" before
> starting Nagios up? Here's one article that describes how doing so
> fixed a similar error with gftp under NetBSD:
>
Worth giving a shot, I guess. If nothing else, it's easier than worrying about
ending up in performance-eating spinlock on the mutex.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
More information about the Developers
mailing list