[naemon-dev] Naemon Livestatus update
Eron Nicholson
eron at basecamp.com
Mon Mar 31 18:49:44 CEST 2014
Hey all,
Thanks for the responses and the info. I appreciate that you guys
are responsive to these issues. I also posted this to the check_mk
users list and haven't gotten any response yet (see
http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-March/011881.html).
Since we are looking to use both Naemon and Check_mk in our new
monitoring system, I would certainly prefer it if there was a single
supported livestatus version shared between the two projects. We do
see some issues when trying to use the Check_MK UI with
naemon-livestatus, as they have added new columns :
Primary - Livestatus error
Unhandled exception: 400: Table 'hosts' has no column
'host_comments_with_extra_info'
We have built our own UI and Thruk is also perfectly fine, so this
isn't really a big concern. As long as the backends are compatible,
we should be fine with either version.
The major issue with the current version of naemon-livestatus is that
it crashes after ~10 seconds in our environment. As I mentioned
earlier, we have tons of passive services being sent in via livestatus
- both from the check_mk agent checks and our own custom checks. If
it disable our custom checks, naemon-livestatus will not crash, so it
has something to do with the additional passive checks we are sending.
I have enabled livestatus logging and debugging via :
broker_module=/usr/lib/naemon/livestatus.o /var/cache/naemon/live
log_file=/var/log/naemon/livestatus.log debug=1
And do not see any errors in the livestatus.log when the process dies.
I do sometimes see segfault errors in the naemon.log :
[1396281951] Caught SIGSEGV, shutting down...
We are very, very reliant on livestatus for both pushing in passive
service checks and pulling data for our UI. So our (new) monitoring
system is basically unusable until we can get a livestatus that works
with naemon and doesn't crash. Fortunately, we still have our nagios3
system up and working, so we have some time to try to figure out these
kinds of issues.
I would love to help out in troubleshooting this problem. Let me know
if there's a newer version of naemon-livestatus that I can try or if
you would like me to gather some more data on the crashes.
Thanks,
Eron Nicholson
Systems Administrator | Basecamp
On Sat, Mar 29, 2014 at 7:51 AM, Anton Löfgren <alofgren at op5.com> wrote:
> I don't want to derail this thread further than necessary, but I just
> thought I should mention that there are also a number of fixes available for
> the build system which I hope to get into naemon in the coming week, apart
> from the unicode stuff Max mentions. The upstream build system (at least
> what we have in the op5 fork) is a complete mess, which anyone who has been
> down that rabbit hole should be able to attest to.
>
> I also added a couple of test cases for said unicode stuff, which should
> make it easier to add new ones in the future.
>
> Anyway, is anyone talking to Kettner about this? Ideally, we'd be able to
> work towards a common goal. Although from what I've heard (though this may
> or may not be accurate), he's not particularly interested in at least some
> of the changes we've made.
>
> If that's not possible for whatever reason, it might be best to do as Max
> says, and cherry-pick whatever changes we want from upstream.
>
> To get back on thread, and reiterate: you're better off using the naemon
> livestatus fork with naemon.
>
> al
>
> On 29 Mar 2014 11:16, "Max Sikström" <max.sikstrom at op5.com> wrote:
>>
>> Hi!
>>
>> I've tried to keep up reading what changes had happend to livestatus
>> upstream. But it's quite hard to track, since livestatus is just a
>> subdirectory in the check_mk repository.
>>
>> As far as I can see, there are just a few new features resolved in the
>> upstream livestatus since the fork:
>> - statehist table is added
>> - bugfixes with the log table
>> - fixes with livecheck, and later removal of the livecheck
>>
>> Since log handling in livestatus is really nasty to use, because of how
>> just increases in memory usage (since livestatus never deallocates it's
>> growing buffer. Once parsed 1GB of logs, 1GB of memory is stored per thread,
>> afaik), I've assumed that check_mk was the only system really used that
>> part.
>>
>>
>> I don't want to see it as naemon-livestatus is older, but just a little
>> bit different.
>>
>> The naemon fork of livestatus has taken a path through op5 before ending
>> up as the naemon-fork. During that time, some issues has been resolved:
>> - Add sorting (and pagination) support, and some bugfixes too. (Sort:
>> column_name asc/desc, Offset: 80, Limit: 20)
>> - Regexp handles case sensitivity for unicode characters correctly (it's
>> really new, so I'm not sure if it's in master yet. Just know that Anton
>> Löfgren/catharsis has it in a branch right now)
>>
>> In the naemon-fork, there are also a couple of bug fixes:
>> - Possible segfault due to races between threads when submitting commands.
>> (Command processing in upstream is done in worker thread, but naemon/nagios
>> isn't thread safe itself, since it doesn't use threads)
>>
>> In short: naemon-livestatus and mk-livestatus has diverged, and before
>> it's practical to upstream changes, it probably will be too.
>>
>>
>> So are there any specific features you need or bugs to resolve in
>> naemon-livestatus that are available in mk-livestatus? Because then, it's
>> probably quite easy to just port those specific ones.
>>
>> Best regards,
>> Max Sikström
>>
>> On 28 Mar 2014, at 19:58, Eron Nicholson <eron at basecamp.com> wrote:
>>
>> > Hello,
>> > I am attempting to use Naemon with Check_MK. Check_MK released a
>> > new version of livestatus today (1.2.5i1) which supports Nagios 4.
>> > However, I am getting errors when trying to use it with Naemon :
>> >
>> > [1396026973] Error: Could not load module
>> > '/usr/lib/check_mk/livestatus.o' -> /usr/lib/check_mk/livestatus.o:
>> > undefined symbol: get_next_log_rotation_time
>> > [1396026973] Error: Failed to load module
>> > '/usr/lib/check_mk/livestatus.o'.
>> > [1396026973] Error: Module loading failed. Aborting.
>> >
>> > We have been having issues with the forked naemon version of
>> > livestatus crashing. We push in a lot of passive services, and it
>> > seems that this is causing livestatus to crash. The forked version is
>> > quite old. I was wondering if there was a plan to update naemon's
>> > livestatus to a more recent version or if there was a plan to allow
>> > naemon to integrate with the latest version of livestatus.
>> >
>> > Thanks,
>> >
>> > Eron Nicholson
>> > Systems Administrator | Basecamp
>>
>
More information about the Naemon-dev
mailing list