<p dir="ltr">The easiest way to track down what causes the segfault would of course be a core dump or a gdb back trace or similar. Is that something you would be able to share? </p>
<p dir="ltr">Aside from that, how exactly are you submitting passive check results via livestatus? Is that even possible?<br>
</p>
<div class="gmail_quote">On 31 Mar 2014 18:49, "Eron Nicholson" <<a href="mailto:eron@basecamp.com">eron@basecamp.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hey all,<br>
Thanks for the responses and the info. I appreciate that you guys<br>
are responsive to these issues. I also posted this to the check_mk<br>
users list and haven't gotten any response yet (see<br>
<a href="http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-March/011881.html" target="_blank">http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-March/011881.html</a>).<br>
<br>
Since we are looking to use both Naemon and Check_mk in our new<br>
monitoring system, I would certainly prefer it if there was a single<br>
supported livestatus version shared between the two projects. We do<br>
see some issues when trying to use the Check_MK UI with<br>
naemon-livestatus, as they have added new columns :<br>
<br>
Primary - Livestatus error<br>
Unhandled exception: 400: Table 'hosts' has no column<br>
'host_comments_with_extra_info'<br>
<br>
We have built our own UI and Thruk is also perfectly fine, so this<br>
isn't really a big concern. As long as the backends are compatible,<br>
we should be fine with either version.<br>
<br>
The major issue with the current version of naemon-livestatus is that<br>
it crashes after ~10 seconds in our environment. As I mentioned<br>
earlier, we have tons of passive services being sent in via livestatus<br>
- both from the check_mk agent checks and our own custom checks. If<br>
it disable our custom checks, naemon-livestatus will not crash, so it<br>
has something to do with the additional passive checks we are sending.<br>
I have enabled livestatus logging and debugging via :<br>
<br>
broker_module=/usr/lib/naemon/livestatus.o /var/cache/naemon/live<br>
log_file=/var/log/naemon/livestatus.log debug=1<br>
<br>
And do not see any errors in the livestatus.log when the process dies.<br>
I do sometimes see segfault errors in the naemon.log :<br>
<br>
[1396281951] Caught SIGSEGV, shutting down...<br>
<br>
<br>
We are very, very reliant on livestatus for both pushing in passive<br>
service checks and pulling data for our UI. So our (new) monitoring<br>
system is basically unusable until we can get a livestatus that works<br>
with naemon and doesn't crash. Fortunately, we still have our nagios3<br>
system up and working, so we have some time to try to figure out these<br>
kinds of issues.<br>
<br>
I would love to help out in troubleshooting this problem. Let me know<br>
if there's a newer version of naemon-livestatus that I can try or if<br>
you would like me to gather some more data on the crashes.<br>
<br>
Thanks,<br>
<br>
Eron Nicholson<br>
Systems Administrator | Basecamp<br>
<br>
<br>
On Sat, Mar 29, 2014 at 7:51 AM, Anton Löfgren <<a href="mailto:alofgren@op5.com">alofgren@op5.com</a>> wrote:<br>
> I don't want to derail this thread further than necessary, but I just<br>
> thought I should mention that there are also a number of fixes available for<br>
> the build system which I hope to get into naemon in the coming week, apart<br>
> from the unicode stuff Max mentions. The upstream build system (at least<br>
> what we have in the op5 fork) is a complete mess, which anyone who has been<br>
> down that rabbit hole should be able to attest to.<br>
><br>
> I also added a couple of test cases for said unicode stuff, which should<br>
> make it easier to add new ones in the future.<br>
><br>
> Anyway, is anyone talking to Kettner about this? Ideally, we'd be able to<br>
> work towards a common goal. Although from what I've heard (though this may<br>
> or may not be accurate), he's not particularly interested in at least some<br>
> of the changes we've made.<br>
><br>
> If that's not possible for whatever reason, it might be best to do as Max<br>
> says, and cherry-pick whatever changes we want from upstream.<br>
><br>
> To get back on thread, and reiterate: you're better off using the naemon<br>
> livestatus fork with naemon.<br>
><br>
> al<br>
><br>
> On 29 Mar 2014 11:16, "Max Sikström" <<a href="mailto:max.sikstrom@op5.com">max.sikstrom@op5.com</a>> wrote:<br>
>><br>
>> Hi!<br>
>><br>
>> I've tried to keep up reading what changes had happend to livestatus<br>
>> upstream. But it's quite hard to track, since livestatus is just a<br>
>> subdirectory in the check_mk repository.<br>
>><br>
>> As far as I can see, there are just a few new features resolved in the<br>
>> upstream livestatus since the fork:<br>
>> - statehist table is added<br>
>> - bugfixes with the log table<br>
>> - fixes with livecheck, and later removal of the livecheck<br>
>><br>
>> Since log handling in livestatus is really nasty to use, because of how<br>
>> just increases in memory usage (since livestatus never deallocates it's<br>
>> growing buffer. Once parsed 1GB of logs, 1GB of memory is stored per thread,<br>
>> afaik), I've assumed that check_mk was the only system really used that<br>
>> part.<br>
>><br>
>><br>
>> I don't want to see it as naemon-livestatus is older, but just a little<br>
>> bit different.<br>
>><br>
>> The naemon fork of livestatus has taken a path through op5 before ending<br>
>> up as the naemon-fork. During that time, some issues has been resolved:<br>
>> - Add sorting (and pagination) support, and some bugfixes too. (Sort:<br>
>> column_name asc/desc, Offset: 80, Limit: 20)<br>
>> - Regexp handles case sensitivity for unicode characters correctly (it's<br>
>> really new, so I'm not sure if it's in master yet. Just know that Anton<br>
>> Löfgren/catharsis has it in a branch right now)<br>
>><br>
>> In the naemon-fork, there are also a couple of bug fixes:<br>
>> - Possible segfault due to races between threads when submitting commands.<br>
>> (Command processing in upstream is done in worker thread, but naemon/nagios<br>
>> isn't thread safe itself, since it doesn't use threads)<br>
>><br>
>> In short: naemon-livestatus and mk-livestatus has diverged, and before<br>
>> it's practical to upstream changes, it probably will be too.<br>
>><br>
>><br>
>> So are there any specific features you need or bugs to resolve in<br>
>> naemon-livestatus that are available in mk-livestatus? Because then, it's<br>
>> probably quite easy to just port those specific ones.<br>
>><br>
>> Best regards,<br>
>> Max Sikström<br>
>><br>
>> On 28 Mar 2014, at 19:58, Eron Nicholson <<a href="mailto:eron@basecamp.com">eron@basecamp.com</a>> wrote:<br>
>><br>
>> > Hello,<br>
>> > I am attempting to use Naemon with Check_MK. Check_MK released a<br>
>> > new version of livestatus today (1.2.5i1) which supports Nagios 4.<br>
>> > However, I am getting errors when trying to use it with Naemon :<br>
>> ><br>
>> > [1396026973] Error: Could not load module<br>
>> > '/usr/lib/check_mk/livestatus.o' -> /usr/lib/check_mk/livestatus.o:<br>
>> > undefined symbol: get_next_log_rotation_time<br>
>> > [1396026973] Error: Failed to load module<br>
>> > '/usr/lib/check_mk/livestatus.o'.<br>
>> > [1396026973] Error: Module loading failed. Aborting.<br>
>> ><br>
>> > We have been having issues with the forked naemon version of<br>
>> > livestatus crashing. We push in a lot of passive services, and it<br>
>> > seems that this is causing livestatus to crash. The forked version is<br>
>> > quite old. I was wondering if there was a plan to update naemon's<br>
>> > livestatus to a more recent version or if there was a plan to allow<br>
>> > naemon to integrate with the latest version of livestatus.<br>
>> ><br>
>> > Thanks,<br>
>> ><br>
>> > Eron Nicholson<br>
>> > Systems Administrator | Basecamp<br>
>><br>
><br>
</blockquote></div>