R: bischeck suddenly stops working

Francesco Toffoli ftoffoli at skylogic.it
Wed Aug 9 17:10:40 CEST 2017


Hi Anders,I modified the log configuration as you suggested, but after the bischeckd daemon stop and start i didn't notice any particular warning or critical messages. So i decided to start waiting for a crash and then to proceed with the logs analisys.  I'll keep you updated.Thanks

Inviato da smartphone Samsung Galaxy.
-------- Messaggio originale --------Da: Anders Håål <anders.haal at ingby.com> Data: 09/08/17  08:16  (GMT+01:00) A: bischeck-users at monitoring-lists.org Oggetto: Re: bischeck suddenly stops working 

    Francesco - any progress on the issue?

    

    

    On 07/26/2017 05:52 PM, Anders Håål
      wrote:

    
    
      
      Thanks for the feedback.
      When bischeck "stop working" it would be interesting to
        understand if anything gets logged after it "stops" and also
        what is logged when you do a restart - but I suggest you do a
        stop and see what is logged before starting.
      I would suggest that you change the log level in logback.xml
        for all packages

      
       <root level="INFO">

            <appender-ref ref="bischeck"/>

          </root>
      To avoid duplicates you should also add the additivity="false"
        on the other logger. Based on the standard logback.xml you can
        test this in your test environment first, have not tested it my
        self, and if it looks good deploy in in production according to
        your specific customization of paths, etc.
      

      
      logback.xml:

      
      <?xml version="1.0" encoding="UTF-8"?>

        

        <configuration>

          <jmxConfigurator />

          <appender name="bischeck"
        class="ch.qos.logback.core.rolling.RollingFileAppender">

            <!--See also http://logback.qos.ch/manual/appenders.html#RollingFileAppender-->

            <File>/var/tmp/bischeck.log</File>

            <encoder>

              <pattern>%d{yyyy-MM-dd
        HH:mm:ss.SSS,Europe/Stockholm} ; %p ; %t ; %c ;
        %m%ex%n</pattern>

            </encoder>

        

            <rollingPolicy
        class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">

              <maxIndex>3</maxIndex>

             
        <FileNamePattern>/var/tmp/bischeck.log.%i</FileNamePattern>

            </rollingPolicy>

        

            <triggeringPolicy
        class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">

              <MaxFileSize>1000KB</MaxFileSize>

            </triggeringPolicy>

        

          </appender>

        

          <logger name="com.ingby" level="INFO"
        additivity="false">

            <appender-ref ref="bischeck"/>

          </logger>

         

          

          <logger
        name="com.ingby.socbox.bischeck.configuration.CachePurgeJob"
        level="DEBUG" additivity="false">

            <appender-ref ref="bischeck"/>

          </logger>

        

          <logger
        name="com.ingby.socbox.bischeck.cache.provider.redis"
        level="DEBUG" additivity="false">

            <appender-ref ref="bischeck"/>

          </logger>

        

        

          <logger name="org.quartz" level="INFO"
        additivity="false">

            <appender-ref ref="bischeck"/>

          </logger>

        

          <root level="WARN">

            <appender-ref ref="bischeck"/>

          </root>

         

        </configuration>

      
      

      The root section will secure that everything from any java
      packages with WARN or ERROR is logged to the bischeck appender. 

      Regards 

      Anders

      

      On 07/25/2017 09:55 AM, Francesco
        Giuseppe Toffoli wrote:

      
      
        
        Hi Anders,

          thanks for your reply. I'll answer you to the variuos
          questions:
        (1) the java version is:
         openjdk version "1.8.0_91"

          OpenJDK Runtime Environment (build 1.8.0_91-b14)

          OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

        
        and has not been updated recently. In our test environment,
        (where the problem does not occur), the version is nearly the
        same (1.8.0_121).

        The OS has not been updated, (CentOS release 6.6).

        

        (2) Redis has not been uptaded recently, (redis 2.8.23). At the
        moment we have more or less 13.000 keys used.

        

        (3) We usually add checks, maybe weekly. The issue started to
        occur some months ago, but it could happen that for 2 or 3 weeks
        everything is ok,  then we have several crashes in a week. I'm
        not so inclined to give the guilt to some new checks, also
        because the testing server is aligned to the production one. 

        

        

        (5) Yes, the restart is done via '/etc/init.d/bischeckd restart'
        and it solves the issue. Physical memory on the server is always
        OK, i don't think to a jvm out of memory.

        

        In the Bischeck logs i didn't notice any error. However, at the
        next crash i'll try have a deeper look at them.

        Could i have a look at some other logs maybe?

        

        Thanks,

        Francesco

        

        

        

        

        

        Il 24/07/2017 21:57, Anders Håål ha
          scritto:

        
        
          
          Hi Giuseppe,
          Sounds strange that it just stopped working after along
            time of stability if not something has change:
          - Anything change on the server you run bischeck on - OS,
            jdk version, ......
          - Update redis version? Change in configuration?
          - Added any new bischeck check or changed something in the
            configuration?

          
          - Anything else you can think about that may have change?
          When you say restarting is it the normal
            /etc/init.d/bischeckd restart that fix the problem? The
            reason I ask is that the script just do a kill with TERM
            signal. If the jvm would be in a out of memory situation it
            may not be enough, but you should have seen that in the log
            I guess. Sure you do not have any ERROR or WARN entries in
            the log.
          /Anders 

          
          

          
          

          On 07/24/2017 02:14 PM, Francesco
            Giuseppe Toffoli wrote:

          
          
            
            Hi, 

              we are experiencing a critical problem with Bischeck. It's
              a couple of months it sometimes suddenly stops working:
              the daemon  /etc/init.d/bicheckd is running but no check
              results are sent to Nagios. Restarting bischeck daemon
              fixes the issue. 

              Unfortunately we can't find any clue about the root cause
              on bischeck logs, not even with DEBUG logging level
              enabled. Redis database seems working properly  and no
              increasing of memory/cpu usage are reported on the server
              hosting bischeck while the issue occurs. 

               

              Do you have any suggestion on how to deeply investigate
              this?
            Regards,

              Francesco

            
            -- 

              
              
              Francesco
                  Giuseppe Toffoli

                  Monitoring Engineer
              GSE
                  Department
              
              Tel:
                  +39 01127387488
              Mobile:
                +39 349.800.60.35 

                  Email: ftoffoli at skylogic.it

              

                Skylogic
                  S. p. A.

                Strada Pianezza, 289

                10151 Torino, Italy  
            

            

            

            This message contains confidential information and is
            intended only for the individual named. If you are not the
            named addressee you should not disseminate, distribute or
            copy this e-mail. Please notify the sender immediately by
            e-mail if you have received this e-mail by mistake and
            delete this e-mail from your system. E-mail transmission
            cannot be guaranteed to be secure or error-free as
            information could be intercepted, corrupted, lost,
            destroyed, arrive late or incomplete, or contain viruses.
            The sender therefore does not accept liability for any
            errors or omissions in the contents of this message, which
            arise as a result of e-mail transmission. If verification is
            required please request a hard-copy version. Please note
            that any views or opinions presented in this email are
            solely those of the author and do not necessarily represent
            those of the Company.

            No employee or agent is authorized to conclude any binding
            agreement on behalf of this Company nor, through this
            latter, any of the Eutelsat Communication group with another
            party by email without express written confirmation by a
            duly authorized officer of the Company. The list of duly
            authorized officers and the scope of their powers is
            published on the Trade Register according to the national
            law of each affiliate. 

          
          

          -- 


Ingby <http://www.ingby.com>

bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>

anders.haal at ingby.com<mailto:anders.haal at ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091

        
        

        -- 

          
          
          Francesco
              Giuseppe Toffoli

              Monitoring Engineer
          GSE
              Department
          
          Tel:
              +39 01127387488
          Mobile:
            +39 349.800.60.35 

              Email: ftoffoli at skylogic.it

          

            Skylogic
              S. p. A.

            Strada Pianezza, 289

            10151 Torino, Italy  
        

        

        

        This message contains confidential information and is intended
        only for the individual named. If you are not the named
        addressee you should not disseminate, distribute or copy this
        e-mail. Please notify the sender immediately by e-mail if you
        have received this e-mail by mistake and delete this e-mail from
        your system. E-mail transmission cannot be guaranteed to be
        secure or error-free as information could be intercepted,
        corrupted, lost, destroyed, arrive late or incomplete, or
        contain viruses. The sender therefore does not accept liability
        for any errors or omissions in the contents of this message,
        which arise as a result of e-mail transmission. If verification
        is required please request a hard-copy version. Please note that
        any views or opinions presented in this email are solely those
        of the author and do not necessarily represent those of the
        Company.

        No employee or agent is authorized to conclude any binding
        agreement on behalf of this Company nor, through this latter,
        any of the Eutelsat Communication group with another party by
        email without express written confirmation by a duly authorized
        officer of the Company. The list of duly authorized officers and
        the scope of their powers is published on the Trade Register
        according to the national law of each affiliate. 

      
      

      -- 


Ingby <http://www.ingby.com>

bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>

anders.haal at ingby.com<mailto:anders.haal at ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091

    
    

    -- 


Ingby <http://www.ingby.com>

bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>

anders.haal at ingby.com<mailto:anders.haal at ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091

  


This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the Company. No employee or agent is authorized to conclude any binding agreement on behalf of this Company nor, through this latter, any of the Eutelsat Communication group with another party by email without express written confirmation by a duly authorized officer of the Company. The list of duly authorized officers and the scope of their powers is published on the Trade Register according to the national law of each affiliate
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20170809/c7f40d8c/attachment-0001.html>


More information about the Bischeck-users mailing list