R: bischeck suddenly stops working
Anders Håål
anders.haal at ingby.com
Wed Aug 9 17:15:51 CEST 2017
Okay and keep us updated with your findings.
On 08/09/2017 05:10 PM, Francesco Toffoli wrote:
> Hi Anders,
> I modified the log configuration as you suggested, but after the
> bischeckd daemon stop and start i didn't notice any particular warning
> or critical messages. So i decided to start waiting for a crash and
> then to proceed with the logs analisys. I'll keep you updated
> .Thanks
>
>
> Inviato da smartphone Samsung Galaxy.
>
> -------- Messaggio originale --------
> Da: Anders Håål <anders.haal at ingby.com>
> Data: 09/08/17 08:16 (GMT+01:00)
> A: bischeck-users at monitoring-lists.org
> Oggetto: Re: bischeck suddenly stops working
>
> Francesco - any progress on the issue?
>
>
> On 07/26/2017 05:52 PM, Anders Håål wrote:
>>
>> Thanks for the feedback.
>>
>> When bischeck "stop working" it would be interesting to understand if
>> anything gets logged after it "stops" and also what is logged when
>> you do a restart - but I suggest you do a stop and see what is logged
>> before starting.
>>
>> I would suggest that you change the log level in logback.xml for all
>> packages
>>
>> <root level="INFO">
>> <appender-ref ref="bischeck"/>
>> </root>
>>
>> To avoid duplicates you should also add the additivity="false" on the
>> other logger. Based on the standard logback.xml you can test this in
>> your test environment first, have not tested it my self, and if it
>> looks good deploy in in production according to your specific
>> customization of paths, etc.
>>
>>
>> logback.xml:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>>
>> <configuration>
>> <jmxConfigurator />
>> <appender name="bischeck"
>> class="ch.qos.logback.core.rolling.RollingFileAppender">
>> <!--See also
>> http://logback.qos.ch/manual/appenders.html#RollingFileAppender-->
>> <File>/var/tmp/bischeck.log</File>
>> <encoder>
>> <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS,Europe/Stockholm} ; %p ; %t
>> ; %c ; %m%ex%n</pattern>
>> </encoder>
>>
>> <rollingPolicy
>> class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
>> <maxIndex>3</maxIndex>
>> <FileNamePattern>/var/tmp/bischeck.log.%i</FileNamePattern>
>> </rollingPolicy>
>>
>> <triggeringPolicy
>> class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
>> <MaxFileSize>1000KB</MaxFileSize>
>> </triggeringPolicy>
>>
>> </appender>
>>
>> <logger name="com.ingby" level="INFO" additivity="false">
>> <appender-ref ref="bischeck"/>
>> </logger>
>>
>>
>> <logger
>> name="com.ingby.socbox.bischeck.configuration.CachePurgeJob"
>> level="DEBUG" additivity="false">
>> <appender-ref ref="bischeck"/>
>> </logger>
>>
>> <logger name="com.ingby.socbox.bischeck.cache.provider.redis"
>> level="DEBUG" additivity="false">
>> <appender-ref ref="bischeck"/>
>> </logger>
>>
>>
>> <logger name="org.quartz" level="INFO" additivity="false">
>> <appender-ref ref="bischeck"/>
>> </logger>
>>
>> <root level="WARN">
>> <appender-ref ref="bischeck"/>
>> </root>
>>
>> </configuration>
>>
>>
>> The root section will secure that everything from any java packages
>> with WARN or ERROR is logged to the bischeck appender.
>> Regards
>> Anders
>>
>> On 07/25/2017 09:55 AM, Francesco Giuseppe Toffoli wrote:
>>>
>>> Hi Anders,
>>> thanks for your reply. I'll answer you to the variuos questions:
>>>
>>> (1) the java version is:
>>>
>>> openjdk version "1.8.0_91"
>>> OpenJDK Runtime Environment (build 1.8.0_91-b14)
>>> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>>>
>>> and has not been updated recently. In our test environment, (where
>>> the problem does not occur), the version is nearly the same (1.8.0_121).
>>> The OS has not been updated, (CentOS release 6.6).
>>>
>>> (2) Redis has not been uptaded recently, (redis 2.8.23). At the
>>> moment we have more or less 13.000 keys used.
>>>
>>> (3) We usually add checks, maybe weekly. The issue started to occur
>>> some months ago, but it could happen that for 2 or 3 weeks
>>> everything is ok, then we have several crashes in a week. I'm not
>>> so inclined to give the guilt to some new checks, also because the
>>> testing server is aligned to the production one.
>>>
>>>
>>> (5) Yes, the restart is done via '/etc/init.d/bischeckd restart' and
>>> it solves the issue. Physical memory on the server is always OK, i
>>> don't think to a jvm out of memory.
>>>
>>> In the Bischeck logs i didn't notice any error. However, at the next
>>> crash i'll try have a deeper look at them.
>>> Could i have a look at some other logs maybe?
>>>
>>> Thanks,
>>> Francesco
>>>
>>>
>>>
>>>
>>>
>>> Il 24/07/2017 21:57, Anders Håål ha scritto:
>>>>
>>>> Hi Giuseppe,
>>>>
>>>> Sounds strange that it just stopped working after along time of
>>>> stability if not something has change:
>>>>
>>>> - Anything change on the server you run bischeck on - OS, jdk
>>>> version, ......
>>>>
>>>> - Update redis version? Change in configuration?
>>>>
>>>> - Added any new bischeck check or changed something in the
>>>> configuration?
>>>>
>>>> - Anything else you can think about that may have change?
>>>>
>>>> When you say restarting is it the normal /etc/init.d/bischeckd
>>>> restart that fix the problem? The reason I ask is that the script
>>>> just do a kill with TERM signal. If the jvm would be in a out of
>>>> memory situation it may not be enough, but you should have seen
>>>> that in the log I guess. Sure you do not have any ERROR or WARN
>>>> entries in the log.
>>>>
>>>> /Anders
>>>>
>>>>
>>>>
>>>> On 07/24/2017 02:14 PM, Francesco Giuseppe Toffoli wrote:
>>>>>
>>>>> Hi,
>>>>> we are experiencing a critical problem with Bischeck. It's a
>>>>> couple of months it sometimes suddenly stops working: the daemon
>>>>> /etc/init.d/bicheckd is running but no check results are sent to
>>>>> Nagios. Restarting bischeck daemon fixes the issue.
>>>>> Unfortunately we can't find any clue about the root cause on
>>>>> bischeck logs, not even with DEBUG logging level enabled. Redis
>>>>> database seems working properly and no increasing of memory/cpu
>>>>> usage are reported on the server hosting bischeck while the issue
>>>>> occurs.
>>>>>
>>>>> Do you have any suggestion on how to deeply investigate this?
>>>>>
>>>>> Regards,
>>>>> Francesco
>>>>>
>>>>> --
>>>>>
>>>>> Francesco Giuseppe Toffoli
>>>>> Monitoring Engineer
>>>>>
>>>>> GSE Department
>>>>>
>>>>> Tel: +39 01127387488
>>>>>
>>>>> Mobile: +39 349.800.60.35
>>>>> Email: _ftoffoli at skylogic.it <mailto:ftoffoli at skylogic.it>_
>>>>> *
>>>>> **Skylogic S. p. A.*
>>>>> Strada Pianezza, 289
>>>>> 10151 Torino, Italy
>>>>>
>>>>>
>>>>>
>>>>> This message contains confidential information and is intended
>>>>> only for the individual named. If you are not the named addressee
>>>>> you should not disseminate, distribute or copy this e-mail. Please
>>>>> notify the sender immediately by e-mail if you have received this
>>>>> e-mail by mistake and delete this e-mail from your system. E-mail
>>>>> transmission cannot be guaranteed to be secure or error-free as
>>>>> information could be intercepted, corrupted, lost, destroyed,
>>>>> arrive late or incomplete, or contain viruses. The sender
>>>>> therefore does not accept liability for any errors or omissions in
>>>>> the contents of this message, which arise as a result of e-mail
>>>>> transmission. If verification is required please request a
>>>>> hard-copy version. Please note that any views or opinions
>>>>> presented in this email are solely those of the author and do not
>>>>> necessarily represent those of the Company.
>>>>> No employee or agent is authorized to conclude any binding
>>>>> agreement on behalf of this Company nor, through this latter, any
>>>>> of the Eutelsat Communication group with another party by email
>>>>> without express written confirmation by a duly authorized officer
>>>>> of the Company. The list of duly authorized officers and the scope
>>>>> of their powers is published on the Trade Register according to
>>>>> the national law of each affiliate.
>>>>
>>>> --
>>>>
>>>>
>>>> Ingby<http://www.ingby.com>
>>>>
>>>> bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
>>>>
>>>> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>>>>
>>>> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>>>>
>>>> Ingenjörsbyn
>>>> Box 531
>>>> 101 30 Stockholm
>>>> Sweden
>>>> www.ingby.com <http://www.ingby.com/>
>>>> Mobil: +46 70 575 35 46
>>>> Tele: +46 75 75 75 090
>>>> Fax: +46 75 75 75 091
>>>
>>> --
>>>
>>> Francesco Giuseppe Toffoli
>>> Monitoring Engineer
>>>
>>> GSE Department
>>>
>>> Tel: +39 01127387488
>>>
>>> Mobile: +39 349.800.60.35
>>> Email: _ftoffoli at skylogic.it <mailto:ftoffoli at skylogic.it>_
>>> *
>>> **Skylogic S. p. A.*
>>> Strada Pianezza, 289
>>> 10151 Torino, Italy
>>>
>>>
>>>
>>> This message contains confidential information and is intended only
>>> for the individual named. If you are not the named addressee you
>>> should not disseminate, distribute or copy this e-mail. Please
>>> notify the sender immediately by e-mail if you have received this
>>> e-mail by mistake and delete this e-mail from your system. E-mail
>>> transmission cannot be guaranteed to be secure or error-free as
>>> information could be intercepted, corrupted, lost, destroyed, arrive
>>> late or incomplete, or contain viruses. The sender therefore does
>>> not accept liability for any errors or omissions in the contents of
>>> this message, which arise as a result of e-mail transmission. If
>>> verification is required please request a hard-copy version. Please
>>> note that any views or opinions presented in this email are solely
>>> those of the author and do not necessarily represent those of the
>>> Company.
>>> No employee or agent is authorized to conclude any binding agreement
>>> on behalf of this Company nor, through this latter, any of the
>>> Eutelsat Communication group with another party by email without
>>> express written confirmation by a duly authorized officer of the
>>> Company. The list of duly authorized officers and the scope of their
>>> powers is published on the Trade Register according to the national
>>> law of each affiliate.
>>
>> --
>>
>>
>> Ingby<http://www.ingby.com>
>>
>> bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
>>
>> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>>
>> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>>
>> Ingenjörsbyn
>> Box 531
>> 101 30 Stockholm
>> Sweden
>> www.ingby.com <http://www.ingby.com/>
>> Mobil: +46 70 575 35 46
>> Tele: +46 75 75 75 090
>> Fax: +46 75 75 75 091
>
> --
>
>
> Ingby<http://www.ingby.com>
>
> bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
>
> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>
> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>
> Ingenjörsbyn
> Box 531
> 101 30 Stockholm
> Sweden
> www.ingby.com <http://www.ingby.com/>
> Mobil: +46 70 575 35 46
> Tele: +46 75 75 75 090
> Fax: +46 75 75 75 091
>
>
> This message contains confidential information and is intended only
> for the individual named. If you are not the named addressee you
> should not disseminate, distribute or copy this e-mail. Please notify
> the sender immediately by e-mail if you have received this e-mail by
> mistake and delete this e-mail from your system. E-mail transmission
> cannot be guaranteed to be secure or error-free as information could
> be intercepted, corrupted, lost, destroyed, arrive late or incomplete,
> or contain viruses. The sender therefore does not accept liability for
> any errors or omissions in the contents of this message, which arise
> as a result of e-mail transmission. If verification is required please
> request a hard-copy version. Please note that any views or opinions
> presented in this email are solely those of the author and do not
> necessarily represent those of the Company.
> No employee or agent is authorized to conclude any binding agreement
> on behalf of this Company nor, through this latter, any of the
> Eutelsat Communication group with another party by email without
> express written confirmation by a duly authorized officer of the
> Company. The list of duly authorized officers and the scope of their
> powers is published on the Trade Register according to the national
> law of each affiliate.
--
Ingby <http://www.ingby.com>
bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>
anders.haal at ingby.com<mailto:anders.haal at ingby.com>
Mjukvara genom ingenjörsmässig kreativitet och kompetens
Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax: +46 75 75 75 091
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20170809/d168b0fa/attachment-0001.html>
More information about the Bischeck-users
mailing list