bischeck suddenly stops working
Anders Håål
anders.haal at ingby.com
Wed Aug 9 08:16:11 CEST 2017
Francesco - any progress on the issue?
On 07/26/2017 05:52 PM, Anders Håål wrote:
>
> Thanks for the feedback.
>
> When bischeck "stop working" it would be interesting to understand if
> anything gets logged after it "stops" and also what is logged when you
> do a restart - but I suggest you do a stop and see what is logged
> before starting.
>
> I would suggest that you change the log level in logback.xml for all
> packages
>
> <root level="INFO">
> <appender-ref ref="bischeck"/>
> </root>
>
> To avoid duplicates you should also add the additivity="false" on the
> other logger. Based on the standard logback.xml you can test this in
> your test environment first, have not tested it my self, and if it
> looks good deploy in in production according to your specific
> customization of paths, etc.
>
>
> logback.xml:
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <configuration>
> <jmxConfigurator />
> <appender name="bischeck"
> class="ch.qos.logback.core.rolling.RollingFileAppender">
> <!--See also
> http://logback.qos.ch/manual/appenders.html#RollingFileAppender-->
> <File>/var/tmp/bischeck.log</File>
> <encoder>
> <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS,Europe/Stockholm} ; %p ; %t
> ; %c ; %m%ex%n</pattern>
> </encoder>
>
> <rollingPolicy
> class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
> <maxIndex>3</maxIndex>
> <FileNamePattern>/var/tmp/bischeck.log.%i</FileNamePattern>
> </rollingPolicy>
>
> <triggeringPolicy
> class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
> <MaxFileSize>1000KB</MaxFileSize>
> </triggeringPolicy>
>
> </appender>
>
> <logger name="com.ingby" level="INFO" additivity="false">
> <appender-ref ref="bischeck"/>
> </logger>
>
>
> <logger name="com.ingby.socbox.bischeck.configuration.CachePurgeJob"
> level="DEBUG" additivity="false">
> <appender-ref ref="bischeck"/>
> </logger>
>
> <logger name="com.ingby.socbox.bischeck.cache.provider.redis"
> level="DEBUG" additivity="false">
> <appender-ref ref="bischeck"/>
> </logger>
>
>
> <logger name="org.quartz" level="INFO" additivity="false">
> <appender-ref ref="bischeck"/>
> </logger>
>
> <root level="WARN">
> <appender-ref ref="bischeck"/>
> </root>
>
> </configuration>
>
>
> The root section will secure that everything from any java packages
> with WARN or ERROR is logged to the bischeck appender.
> Regards
> Anders
>
> On 07/25/2017 09:55 AM, Francesco Giuseppe Toffoli wrote:
>>
>> Hi Anders,
>> thanks for your reply. I'll answer you to the variuos questions:
>>
>> (1) the java version is:
>>
>> openjdk version "1.8.0_91"
>> OpenJDK Runtime Environment (build 1.8.0_91-b14)
>> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>>
>> and has not been updated recently. In our test environment, (where
>> the problem does not occur), the version is nearly the same (1.8.0_121).
>> The OS has not been updated, (CentOS release 6.6).
>>
>> (2) Redis has not been uptaded recently, (redis 2.8.23). At the
>> moment we have more or less 13.000 keys used.
>>
>> (3) We usually add checks, maybe weekly. The issue started to occur
>> some months ago, but it could happen that for 2 or 3 weeks everything
>> is ok, then we have several crashes in a week. I'm not so inclined
>> to give the guilt to some new checks, also because the testing server
>> is aligned to the production one.
>>
>>
>> (5) Yes, the restart is done via '/etc/init.d/bischeckd restart' and
>> it solves the issue. Physical memory on the server is always OK, i
>> don't think to a jvm out of memory.
>>
>> In the Bischeck logs i didn't notice any error. However, at the next
>> crash i'll try have a deeper look at them.
>> Could i have a look at some other logs maybe?
>>
>> Thanks,
>> Francesco
>>
>>
>>
>>
>>
>> Il 24/07/2017 21:57, Anders Håål ha scritto:
>>>
>>> Hi Giuseppe,
>>>
>>> Sounds strange that it just stopped working after along time of
>>> stability if not something has change:
>>>
>>> - Anything change on the server you run bischeck on - OS, jdk
>>> version, ......
>>>
>>> - Update redis version? Change in configuration?
>>>
>>> - Added any new bischeck check or changed something in the
>>> configuration?
>>>
>>> - Anything else you can think about that may have change?
>>>
>>> When you say restarting is it the normal /etc/init.d/bischeckd
>>> restart that fix the problem? The reason I ask is that the script
>>> just do a kill with TERM signal. If the jvm would be in a out of
>>> memory situation it may not be enough, but you should have seen that
>>> in the log I guess. Sure you do not have any ERROR or WARN entries
>>> in the log.
>>>
>>> /Anders
>>>
>>>
>>>
>>> On 07/24/2017 02:14 PM, Francesco Giuseppe Toffoli wrote:
>>>>
>>>> Hi,
>>>> we are experiencing a critical problem with Bischeck. It's a couple
>>>> of months it sometimes suddenly stops working: the daemon
>>>> /etc/init.d/bicheckd is running but no check results are sent to
>>>> Nagios. Restarting bischeck daemon fixes the issue.
>>>> Unfortunately we can't find any clue about the root cause on
>>>> bischeck logs, not even with DEBUG logging level enabled. Redis
>>>> database seems working properly and no increasing of memory/cpu
>>>> usage are reported on the server hosting bischeck while the issue
>>>> occurs.
>>>>
>>>> Do you have any suggestion on how to deeply investigate this?
>>>>
>>>> Regards,
>>>> Francesco
>>>>
>>>> --
>>>>
>>>> Francesco Giuseppe Toffoli
>>>> Monitoring Engineer
>>>>
>>>> GSE Department
>>>>
>>>> Tel: +39 01127387488
>>>>
>>>> Mobile: +39 349.800.60.35
>>>> Email: _ftoffoli at skylogic.it <mailto:ftoffoli at skylogic.it>_
>>>> *
>>>> **Skylogic S. p. A.*
>>>> Strada Pianezza, 289
>>>> 10151 Torino, Italy
>>>>
>>>>
>>>>
>>>> This message contains confidential information and is intended only
>>>> for the individual named. If you are not the named addressee you
>>>> should not disseminate, distribute or copy this e-mail. Please
>>>> notify the sender immediately by e-mail if you have received this
>>>> e-mail by mistake and delete this e-mail from your system. E-mail
>>>> transmission cannot be guaranteed to be secure or error-free as
>>>> information could be intercepted, corrupted, lost, destroyed,
>>>> arrive late or incomplete, or contain viruses. The sender therefore
>>>> does not accept liability for any errors or omissions in the
>>>> contents of this message, which arise as a result of e-mail
>>>> transmission. If verification is required please request a
>>>> hard-copy version. Please note that any views or opinions presented
>>>> in this email are solely those of the author and do not necessarily
>>>> represent those of the Company.
>>>> No employee or agent is authorized to conclude any binding
>>>> agreement on behalf of this Company nor, through this latter, any
>>>> of the Eutelsat Communication group with another party by email
>>>> without express written confirmation by a duly authorized officer
>>>> of the Company. The list of duly authorized officers and the scope
>>>> of their powers is published on the Trade Register according to the
>>>> national law of each affiliate.
>>>
>>> --
>>>
>>>
>>> Ingby<http://www.ingby.com>
>>>
>>> bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
>>>
>>> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>>>
>>> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>>>
>>> Ingenjörsbyn
>>> Box 531
>>> 101 30 Stockholm
>>> Sweden
>>> www.ingby.com <http://www.ingby.com/>
>>> Mobil: +46 70 575 35 46
>>> Tele: +46 75 75 75 090
>>> Fax: +46 75 75 75 091
>>
>> --
>>
>> Francesco Giuseppe Toffoli
>> Monitoring Engineer
>>
>> GSE Department
>>
>> Tel: +39 01127387488
>>
>> Mobile: +39 349.800.60.35
>> Email: _ftoffoli at skylogic.it <mailto:ftoffoli at skylogic.it>_
>> *
>> **Skylogic S. p. A.*
>> Strada Pianezza, 289
>> 10151 Torino, Italy
>>
>>
>>
>> This message contains confidential information and is intended only
>> for the individual named. If you are not the named addressee you
>> should not disseminate, distribute or copy this e-mail. Please notify
>> the sender immediately by e-mail if you have received this e-mail by
>> mistake and delete this e-mail from your system. E-mail transmission
>> cannot be guaranteed to be secure or error-free as information could
>> be intercepted, corrupted, lost, destroyed, arrive late or
>> incomplete, or contain viruses. The sender therefore does not accept
>> liability for any errors or omissions in the contents of this
>> message, which arise as a result of e-mail transmission. If
>> verification is required please request a hard-copy version. Please
>> note that any views or opinions presented in this email are solely
>> those of the author and do not necessarily represent those of the
>> Company.
>> No employee or agent is authorized to conclude any binding agreement
>> on behalf of this Company nor, through this latter, any of the
>> Eutelsat Communication group with another party by email without
>> express written confirmation by a duly authorized officer of the
>> Company. The list of duly authorized officers and the scope of their
>> powers is published on the Trade Register according to the national
>> law of each affiliate.
>
> --
>
>
> Ingby<http://www.ingby.com>
>
> bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
>
> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>
> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>
> Ingenjörsbyn
> Box 531
> 101 30 Stockholm
> Sweden
> www.ingby.com <http://www.ingby.com/>
> Mobil: +46 70 575 35 46
> Tele: +46 75 75 75 090
> Fax: +46 75 75 75 091
--
Ingby <http://www.ingby.com>
bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>
anders.haal at ingby.com<mailto:anders.haal at ingby.com>
Mjukvara genom ingenjörsmässig kreativitet och kompetens
Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax: +46 75 75 75 091
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20170809/4b43fc5f/attachment-0001.html>
More information about the Bischeck-users
mailing list