NDO utils bug/explanation
Michael Friedrich
michael.friedrich at univie.ac.at
Fri Sep 18 13:16:15 CEST 2009
Hi there,
Frassinelli, Marco wrote the following on 18.09.2009 12:25:
> Hi,
> I think is not a correct behavior because visualization software as
> nagvis and others uses the table programstatus to check if nagios is
> running.
The process of starting ndo2db and then Nagios makes sure that there is
actual data within the DB. If there is an outdated data within the DB it
needs to be removed before Nagios even sends new data. So the process of
trimming those table entries is truly intentional at the beginning
(so-called pre-launch state where the if condition matches). If ndo2db
fails for some reason, those data will remain within the database and
then removed during the next start.
>
> I saw that often this table is empty.
Depending on your startup routine I would guess that you started Nagios
first and then ndo2db. But it shouldn't because ndomod as an event
broker keeps data not written to ndo2db in a defined cache. Depending on
your configuration this cache may be to little so the oldest entry could
be lost (in this case the programstatus of Nagios). But that's really a
guess you'll have to give more information where and when this error
occurs mentioning all circumstances you'll catch up in the logs (turn on
very detailed and everything in debug_level in case).
> Code calculates the difference between now() and status_update_time.
> If the record is null this difference is far more than the
> configurable interval, tipical 180 sec.
Which code and which configuration?
The only thing I can see here is tstamp.tv_sec which is a converted
timestamp got from eventbroker module. This is kind of now() but
recently a now() from Nagios itsself. You may check
ndo2db.c;ndo2db_convert_standard_data_elements
The other compared value is dbinfo.latest_realtime_data_time which is
initialized in db.c:ndo2db_db_init and then updated if
dbinfo.latest_program_status_time newer (db.c:374; directly to that
value). There are several other realtime datavalues which may update
this value.
So the clue of this data is - if actual Nagios NDO_DATA_TIMESTAMP is
newer than the latest realtime data gotten some time before, it is time
for a cleanup at the very beginning of ndo2db (check the sequence in
ido2db.c:main).
>
> The problem is that this difference suddenly vary from near 0 to
> infinity.
The conditional statement does not only insist on the difference 0 or
more but also if it is a process pre launch (see above). But besides a
question - how did you get to this values? Current NDOUtils code doesn't
give and debug information at this stage.
>
> Perhaps this is a problem in my ndo setup, and those deletes normally
> occurs rarely. But I saw them every 60 seconds.
> Here ndo2db log:
>
> As you can see the ndo2db pid varies, I think that when it has no more
> data the child exits, an a new one is forked. The new child then
> deletes records in db.
Seeing your ndo2db die and refork explains why the pre_launch_state and
timestamp condition is matching and so within each period of time,
database cleanup is performed.
It would be interesting why ndo2db is dying. Depending on your
configuration this may vary - tcp or unix socket e.g.? What about more
detailed debuglogs or are there messages like "error writing to
datasink" in the logs?
Kind regards,
Michael
--
DI (FH) Michael Friedrich
michael.friedrich at univie.ac.at
Tel: +43 1 4277 14359
Vienna University Computer Center
Universitaetsstrasse 7
A-1010 Vienna, Austria
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20090918/9be48864/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel
More information about the Developers
mailing list