R: bischeck suddenly stops working
Francesco Toffoli
ftoffoli at skylogic.it
Wed Aug 9 17:10:40 CEST 2017
Hi Anders,I modified the log configuration as you suggested, but after the bischeckd daemon stop and start i didn't notice any particular warning or critical messages. So i decided to start waiting for a crash and then to proceed with the logs analisys. I'll keep you updated.Thanks
Inviato da smartphone Samsung Galaxy.
-------- Messaggio originale --------Da: Anders Håål <anders.haal at ingby.com> Data: 09/08/17 08:16 (GMT+01:00) A: bischeck-users at monitoring-lists.org Oggetto: Re: bischeck suddenly stops working
Francesco - any progress on the issue?
On 07/26/2017 05:52 PM, Anders Håål
wrote:
Thanks for the feedback.
When bischeck "stop working" it would be interesting to
understand if anything gets logged after it "stops" and also
what is logged when you do a restart - but I suggest you do a
stop and see what is logged before starting.
I would suggest that you change the log level in logback.xml
for all packages
<root level="INFO">
<appender-ref ref="bischeck"/>
</root>
To avoid duplicates you should also add the additivity="false"
on the other logger. Based on the standard logback.xml you can
test this in your test environment first, have not tested it my
self, and if it looks good deploy in in production according to
your specific customization of paths, etc.
logback.xml:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<jmxConfigurator />
<appender name="bischeck"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<!--See also http://logback.qos.ch/manual/appenders.html#RollingFileAppender-->
<File>/var/tmp/bischeck.log</File>
<encoder>
<pattern>%d{yyyy-MM-dd
HH:mm:ss.SSS,Europe/Stockholm} ; %p ; %t ; %c ;
%m%ex%n</pattern>
</encoder>
<rollingPolicy
class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<maxIndex>3</maxIndex>
<FileNamePattern>/var/tmp/bischeck.log.%i</FileNamePattern>
</rollingPolicy>
<triggeringPolicy
class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<MaxFileSize>1000KB</MaxFileSize>
</triggeringPolicy>
</appender>
<logger name="com.ingby" level="INFO"
additivity="false">
<appender-ref ref="bischeck"/>
</logger>
<logger
name="com.ingby.socbox.bischeck.configuration.CachePurgeJob"
level="DEBUG" additivity="false">
<appender-ref ref="bischeck"/>
</logger>
<logger
name="com.ingby.socbox.bischeck.cache.provider.redis"
level="DEBUG" additivity="false">
<appender-ref ref="bischeck"/>
</logger>
<logger name="org.quartz" level="INFO"
additivity="false">
<appender-ref ref="bischeck"/>
</logger>
<root level="WARN">
<appender-ref ref="bischeck"/>
</root>
</configuration>
The root section will secure that everything from any java
packages with WARN or ERROR is logged to the bischeck appender.
Regards
Anders
On 07/25/2017 09:55 AM, Francesco
Giuseppe Toffoli wrote:
Hi Anders,
thanks for your reply. I'll answer you to the variuos
questions:
(1) the java version is:
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
and has not been updated recently. In our test environment,
(where the problem does not occur), the version is nearly the
same (1.8.0_121).
The OS has not been updated, (CentOS release 6.6).
(2) Redis has not been uptaded recently, (redis 2.8.23). At the
moment we have more or less 13.000 keys used.
(3) We usually add checks, maybe weekly. The issue started to
occur some months ago, but it could happen that for 2 or 3 weeks
everything is ok, then we have several crashes in a week. I'm
not so inclined to give the guilt to some new checks, also
because the testing server is aligned to the production one.
(5) Yes, the restart is done via '/etc/init.d/bischeckd restart'
and it solves the issue. Physical memory on the server is always
OK, i don't think to a jvm out of memory.
In the Bischeck logs i didn't notice any error. However, at the
next crash i'll try have a deeper look at them.
Could i have a look at some other logs maybe?
Thanks,
Francesco
Il 24/07/2017 21:57, Anders Håål ha
scritto:
Hi Giuseppe,
Sounds strange that it just stopped working after along
time of stability if not something has change:
- Anything change on the server you run bischeck on - OS,
jdk version, ......
- Update redis version? Change in configuration?
- Added any new bischeck check or changed something in the
configuration?
- Anything else you can think about that may have change?
When you say restarting is it the normal
/etc/init.d/bischeckd restart that fix the problem? The
reason I ask is that the script just do a kill with TERM
signal. If the jvm would be in a out of memory situation it
may not be enough, but you should have seen that in the log
I guess. Sure you do not have any ERROR or WARN entries in
the log.
/Anders
On 07/24/2017 02:14 PM, Francesco
Giuseppe Toffoli wrote:
Hi,
we are experiencing a critical problem with Bischeck. It's
a couple of months it sometimes suddenly stops working:
the daemon /etc/init.d/bicheckd is running but no check
results are sent to Nagios. Restarting bischeck daemon
fixes the issue.
Unfortunately we can't find any clue about the root cause
on bischeck logs, not even with DEBUG logging level
enabled. Redis database seems working properly and no
increasing of memory/cpu usage are reported on the server
hosting bischeck while the issue occurs.
Do you have any suggestion on how to deeply investigate
this?
Regards,
Francesco
--
Francesco
Giuseppe Toffoli
Monitoring Engineer
GSE
Department
Tel:
+39 01127387488
Mobile:
+39 349.800.60.35
Email: ftoffoli at skylogic.it
Skylogic
S. p. A.
Strada Pianezza, 289
10151 Torino, Italy
This message contains confidential information and is
intended only for the individual named. If you are not the
named addressee you should not disseminate, distribute or
copy this e-mail. Please notify the sender immediately by
e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. E-mail transmission
cannot be guaranteed to be secure or error-free as
information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses.
The sender therefore does not accept liability for any
errors or omissions in the contents of this message, which
arise as a result of e-mail transmission. If verification is
required please request a hard-copy version. Please note
that any views or opinions presented in this email are
solely those of the author and do not necessarily represent
those of the Company.
No employee or agent is authorized to conclude any binding
agreement on behalf of this Company nor, through this
latter, any of the Eutelsat Communication group with another
party by email without express written confirmation by a
duly authorized officer of the Company. The list of duly
authorized officers and the scope of their powers is
published on the Trade Register according to the national
law of each affiliate.
--
Ingby <http://www.ingby.com>
bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>
anders.haal at ingby.com<mailto:anders.haal at ingby.com>
Mjukvara genom ingenjörsmässig kreativitet och kompetens
Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax: +46 75 75 75 091
--
Francesco
Giuseppe Toffoli
Monitoring Engineer
GSE
Department
Tel:
+39 01127387488
Mobile:
+39 349.800.60.35
Email: ftoffoli at skylogic.it
Skylogic
S. p. A.
Strada Pianezza, 289
10151 Torino, Italy
This message contains confidential information and is intended
only for the individual named. If you are not the named
addressee you should not disseminate, distribute or copy this
e-mail. Please notify the sender immediately by e-mail if you
have received this e-mail by mistake and delete this e-mail from
your system. E-mail transmission cannot be guaranteed to be
secure or error-free as information could be intercepted,
corrupted, lost, destroyed, arrive late or incomplete, or
contain viruses. The sender therefore does not accept liability
for any errors or omissions in the contents of this message,
which arise as a result of e-mail transmission. If verification
is required please request a hard-copy version. Please note that
any views or opinions presented in this email are solely those
of the author and do not necessarily represent those of the
Company.
No employee or agent is authorized to conclude any binding
agreement on behalf of this Company nor, through this latter,
any of the Eutelsat Communication group with another party by
email without express written confirmation by a duly authorized
officer of the Company. The list of duly authorized officers and
the scope of their powers is published on the Trade Register
according to the national law of each affiliate.
--
Ingby <http://www.ingby.com>
bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>
anders.haal at ingby.com<mailto:anders.haal at ingby.com>
Mjukvara genom ingenjörsmässig kreativitet och kompetens
Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax: +46 75 75 75 091
--
Ingby <http://www.ingby.com>
bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>
anders.haal at ingby.com<mailto:anders.haal at ingby.com>
Mjukvara genom ingenjörsmässig kreativitet och kompetens
Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax: +46 75 75 75 091
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the Company. No employee or agent is authorized to conclude any binding agreement on behalf of this Company nor, through this latter, any of the Eutelsat Communication group with another party by email without express written confirmation by a duly authorized officer of the Company. The list of duly authorized officers and the scope of their powers is published on the Trade Register according to the national law of each affiliate
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20170809/c7f40d8c/attachment-0001.html>
More information about the Bischeck-users
mailing list