[Nagios-users] external commands and segfault -- again
Andurin
andurin at process-zero.de
Wed Jan 31 12:54:18 CET 2007
Hi List,
I am afraid to say, there seems to be a further or a new bug within the
downtimes in Nagios 2.7 (non-CVS).
I've created a downtime for a parent host with the option "Schedule
triggered downtime for all child hosts" from:
today 12:00:00 up to 12:10:00
and after this a second downtime from
today 12:09:00 up to 12:20:00
So these two downtime for one parent with a few child hosts overlaps
each other with one minute.
After the first downtime ends nagios dies with a Segfault.
More bad news:
I have tried to use the unstripped binary with the gnu debugger to catch
the buggy lines... but the segfault does not occur.
I know that a collegue of mine has the same problem on his nagios server.
Here is the snippet of my nagios.log
[1170240406] EXTERNAL COMMAND:
SCHEDULE_AND_PROPAGATE_TRIGGERED_HOST_DOWNTIME;vvcdo-atm-R1;1170240379;1170240900;1;0;7200;baeckerh;Testing
Downtimes
[1170240406] HOST DOWNTIME ALERT: vvcdo-atm-R1;STARTED; Host has entered
a period of scheduled downtime
[1170240406] HOST DOWNTIME ALERT: child-berlin;STARTED; Host has entered
a period of scheduled downtime
.... more logs about the childs...
[1170240426] EXTERNAL COMMAND:
SCHEDULE_AND_PROPAGATE_TRIGGERED_HOST_DOWNTIME;vvcdo-atm-R1;1170240840;1170241080;1;0;7200;baeckerh;Testing
Downtimes 2
[1170240436] Auto-save of retention data completed successfully.
[1170240496] Auto-save of retention data completed successfully.
[1170240556] Auto-save of retention data completed successfully.
[1170240616] Auto-save of retention data completed successfully.
[1170240676] Auto-save of retention data completed successfully.
[1170240736] Auto-save of retention data completed successfully.
[1170240796] Auto-save of retention data completed successfully.
[1170240856] Auto-save of retention data completed successfully.
BANG!
[1170240922] Nagios 2.7 starting... (PID=27191)
[1170240922] LOG VERSION: 2.0
How can I try to get further informations why nagios segfaults when
using the unstripped binary or the gdb are not catching the segfault?
Kind regards
Hendrik
Ethan Galstad schrieb:
> Andreas Ericsson wrote:
>
>> bobi at netshel.net wrote:
>>
>> <snip> many great error descriptions
>>
>>
>
> Hmmmm... this is not good. I just looked through the source code and
> found a bug that looks like it could be the cause of the problem. There
> are actually two potential segfault scenarios that I found are they have
> been around for a long time...
>
> 1. If a scheduled downtime entry is manually deleted/cancelled, the
> corresponding event in the event queue is not removed. The event item
> still contains a pointer to the (now deleted) downtime entry. This can
> cause a segfault.
>
> 2. There was another code segment in downtime.c where when a downtime
> entry was deleted, it was deleted and then later referenced when Nagios
> searched through other downtime entries to see if they were triggered by
> the original (deleted) downtime. Why this hasn't caused segfaults every
> time a downtime entry is deleted is beyond me.
>
> At any rate, I have just posted a patch to the 2.x branch of CVS. The
> patch changes the way scheduled downtime is referenced from the event
> queue. Instead of storing a pointer to the downtime data struct, the
> downtime id number is now used instead. The timed event handler will
> search for a downtime entry matching the id before it does anything. If
> the downtime was already deleted, its okay. Give it a try and see if
> things improve.
>
> Unfortunately, this patch will now break the ndoutils addon (yesterday's
> release, as well as earlier revisions). I'll get a patch in CVS shortly
> to fix this. Thanks for the great problem description!
>
>
>
> Ethan Galstad,
> Nagios Developer
> ---
> Email: nagios at nagios.org
> Website: http://www.nagios.org
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>
>
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
More information about the Developers
mailing list