Ang.: Specifying the retention period

anders.haal@ingby.com anders.haal at ingby.com
Fri Sep 12 15:01:06 CEST 2014
Previous message: Specifying the retention period
Next message: High CPU consumption by java and redis-server
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Good input and good luck with your testing.

----- Reply message -----
Från: "Rahul Amaram" <rahul.amaram at vizury.com>
Till: <anders.haal at ingby.com>, <bischeck-users at monitoring-lists.org>
Rubrik: Specifying the retention period
Datum: fre, sep 12, 2014 13:12

Yup that's a useful tool. I think in the documentation you can have a 
Troubleshooting section where you cover some of these tools separately 
and some common scenarios on how to troubleshoot.

- Rahul.

On Friday 12 September 2014 02:41 PM, Anders Håål wrote:
> Glad that it worked out. What is clear to me is that this topic is not 
> that simple to understand with the current documentation, so this 
> feedback from you is vary valuable. Will add some additional blog 
> posts on the topic and then get it into the next major release 
> documentation. We will also need to figure out if this can be simplified.
>
> Did you try the CacheCli?
>
> Keep the feedback coming.
> Anders
>
> On 09/11/2014 11:39 PM, Rahul Amaram wrote:
>> Ok. I figured out the problem. It was with my understanding. I have 
>> weekend to be true. So, instead of 
>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23], I should 
>> be using 
>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg/weekend-$$SERVICEITEMNAME$$[23] 
>> and so on.
>>
>> Thanks for the awesome support.
>>
>> - Rahul.
>>
>> On Thursday 11 September 2014 11:43 AM, Anders Håål wrote:
>>> Hi Rahul,
>>> Now I have a backlog of questions :)
>>> Okay lets start with the last question.
>>> - First verify that you have data in the cahe. User redis-cli or the 
>>> Bischeck CacheCli, 
>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4.
>>> - Then there is an issue with null data. Lets say that one of the 
>>> expressions you have return null. Null is tricky so in Bischeck you 
>>> have to decide how to manage a null value. Look at 
>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Section-4.3. 
>>>
>>> - You can also check the logs and also increase the loglevel to 
>>> debug to get more info. Check out 
>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-3.2. 
>>>
>>>
>>> The two following questions I will try to clarify better later, must 
>>> run into a meeting, but the index on hour specify an specific hour, 
>>> like the avg, max or min for that hour. Index 0 means the last 
>>> calculated hour so if time is 2:30 index 0 means the avg, max or min 
>>> for the period 1:00 to 2:00.
>>>
>>> These are good question, we are glad that get your users perspective 
>>> on this.
>>> Anders
>>>
>>> On 09/11/2014 07:19 AM, Rahul Amaram wrote:
>>>> This doesn't help :(.
>>>>
>>>> <threshold>avg($$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[167],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[335])</threshold> 
>>>>
>>>>
>>>> - Rahul.
>>>>
>>>> On Thursday 11 September 2014 10:45 AM, Rahul Amaram wrote:
>>>>> Also, let us say, that the current time is 2.30 and that I want 
>>>>> the average of all the values between 2.00 and 3.00 the previous 
>>>>> day, I'd probably have to use
>>>>>
>>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23]
>>>>>
>>>>> rather than
>>>>>
>>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[24]
>>>>>
>>>>> Am I right ?
>>>>>
>>>>> Thanks,
>>>>> Rahul.
>>>>>
>>>>> On Thursday 11 September 2014 10:39 AM, Rahul Amaram wrote:
>>>>>> Ok. So would 
>>>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[24] refer 
>>>>>> to the average of the all the values ONLY in the 24th hour before 
>>>>>> the current time?
>>>>>>
>>>>>> On Thursday 11 September 2014 10:30 AM, Anders Håål wrote:
>>>>>>> Hi Amaram,
>>>>>>> I think you just need to remove the minus sign when using the 
>>>>>>> aggregated. Minus is used for time, like back in time, and just 
>>>>>>> a integer without minus and a time indicator is an index. Check 
>>>>>>> out 
>>>>>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Chapter-4. 
>>>>>>>
>>>>>>> You can also use redis-cli to explore the data in the cache. The 
>>>>>>> key in the redis is the same as the service definition.
>>>>>>> Anders
>>>>>>>
>>>>>>> On 09/11/2014 06:38 AM, Rahul Amaram wrote:
>>>>>>>> Ok. I am facing another issue. I have been running bischeck 
>>>>>>>> with the aggregate function for more than a day. I am using the 
>>>>>>>> below threshold function.
>>>>>>>>
>>>>>>>> <threshold>avg($$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-24],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-168],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-336])</threshold> 
>>>>>>>>
>>>>>>>>
>>>>>>>> and it doesn't seem to work. I am expecting that the first 
>>>>>>>> aggregate value should be available.
>>>>>>>>
>>>>>>>> Instead if I use the below threshold function (I know this is 
>>>>>>>> not related to aggregate)
>>>>>>>>
>>>>>>>> avg($$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-24H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-168H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-336H]) 
>>>>>>>>
>>>>>>>>
>>>>>>>> the threshold is calcuated fine, which is just the first value 
>>>>>>>> as the remaining two values are not in cache.
>>>>>>>>
>>>>>>>> How can I debug why aggregate is not working?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Rahul.
>>>>>>>>
>>>>>>>> On Wednesday 10 September 2014 04:53 PM, Anders Håål wrote:
>>>>>>>>> Thanks - got the ticket.
>>>>>>>>> I will update progress on the bug ticket, but its good that 
>>>>>>>>> the work around works.
>>>>>>>>> Anders
>>>>>>>>>
>>>>>>>>> On 09/10/2014 01:20 PM, Rahul Amaram wrote:
>>>>>>>>>> That indeed seems to be the problem. Using count rather than 
>>>>>>>>>> period
>>>>>>>>>> seems to address the issue. Raised a ticket -
>>>>>>>>>> http://gforge.ingby.com/gf/project/bischeck/tracker/?action=TrackerItemEdit&tracker_item_id=259 
>>>>>>>>>>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Rahul.
>>>>>>>>>>
>>>>>>>>>> On Wednesday 10 September 2014 04:02 PM, Anders Håål wrote:
>>>>>>>>>>> This looks like a bug. Could you please report it on
>>>>>>>>>>> http://gforge.ingby.com/gf/project/bischeck/tracker/ in the 
>>>>>>>>>>> Bugs
>>>>>>>>>>> tracker. You need a account but its just a sign up and you 
>>>>>>>>>>> get an
>>>>>>>>>>> email confirmation.
>>>>>>>>>>> Can you try to use maxcount for purging instead as a work 
>>>>>>>>>>> around? Just
>>>>>>>>>>> calculate your maxcount based on the scheduling interval you 
>>>>>>>>>>> use.
>>>>>>>>>>> Anders
>>>>>>>>>>>
>>>>>>>>>>> On 09/10/2014 12:17 PM, Rahul Amaram wrote:
>>>>>>>>>>>> Following up on the earlier topic, I am seeing the below 
>>>>>>>>>>>> errors related
>>>>>>>>>>>> to cache purge. Any idea on what might be causing this? I 
>>>>>>>>>>>> don't see any
>>>>>>>>>>>> other errors in log related to metrics.
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-09-10 12:12:00.001 ; INFO ; 
>>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob ; 
>>>>>>>>>>>> CachePurge
>>>>>>>>>>>> purging 180
>>>>>>>>>>>> 2014-09-10 12:12:00.003 ; INFO ; 
>>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob ; 
>>>>>>>>>>>> CachePurge
>>>>>>>>>>>> executed in 1 ms
>>>>>>>>>>>> 2014-09-10 12:12:00.003 ; ERROR ; 
>>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>>>> org.quartz.core.JobRunShell ; Job 
>>>>>>>>>>>> DailyMaintenance.CachePurge threw an
>>>>>>>>>>>> unhandled Exception: java.lang.NullPointerException: null
>>>>>>>>>>>>          at
>>>>>>>>>>>> com.ingby.socbox.bischeck.cache.provider.redis.LastStatusCache.trim(LastStatusCache.java:1250) 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>          at
>>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob.execute(CachePurgeJob.java:140) 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-09-10 12:12:00.003 ; ERROR ; 
>>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>>>> org.quartz.core.ErrorLogger ; Job 
>>>>>>>>>>>> (DailyMaintenance.CachePurge threw an
>>>>>>>>>>>> exception.org.quartz.SchedulerException: Job threw an 
>>>>>>>>>>>> unhandled
>>>>>>>>>>>> exception.
>>>>>>>>>>>>          at 
>>>>>>>>>>>> org.quartz.core.JobRunShell.run(JobRunShell.java:224)
>>>>>>>>>>>>          at
>>>>>>>>>>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Caused by: java.lang.NullPointerException: null
>>>>>>>>>>>>          at
>>>>>>>>>>>> com.ingby.socbox.bischeck.cache.provider.redis.LastStatusCache.trim(LastStatusCache.java:1250) 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>          at
>>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob.execute(CachePurgeJob.java:140) 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Here is my cache configuration:
>>>>>>>>>>>>
>>>>>>>>>>>>      <cache>
>>>>>>>>>>>>        <aggregate>
>>>>>>>>>>>>          <method>avg</method>
>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>          <retention>
>>>>>>>>>>>>            <period>H</period>
>>>>>>>>>>>>            <offset>720</offset>
>>>>>>>>>>>>          </retention>
>>>>>>>>>>>>          <retention>
>>>>>>>>>>>>            <period>D</period>
>>>>>>>>>>>>            <offset>30</offset>
>>>>>>>>>>>>          </retention>
>>>>>>>>>>>>        </aggregate>
>>>>>>>>>>>>
>>>>>>>>>>>>        <purge>
>>>>>>>>>>>>          <offset>30</offset>
>>>>>>>>>>>>          <period>D</period>
>>>>>>>>>>>>        </purge>
>>>>>>>>>>>>      </cache>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Rahul.
>>>>>>>>>>>> On Monday 08 September 2014 08:39 PM, Anders Håål wrote:
>>>>>>>>>>>>> Great if you can make a debian package, and I understand 
>>>>>>>>>>>>> that you can
>>>>>>>>>>>>> not commit. The best thing would be integrated to our 
>>>>>>>>>>>>> build process
>>>>>>>>>>>>> where we use ant.
>>>>>>>>>>>>>
>>>>>>>>>>>>> if the purging is based on time then it could happen that 
>>>>>>>>>>>>> data is
>>>>>>>>>>>>> removed from the cache since the logic is based on time 
>>>>>>>>>>>>> relative to
>>>>>>>>>>>>> now. To avoid it you should increase the purge time before 
>>>>>>>>>>>>> you start
>>>>>>>>>>>>> bischeck. And just a comment on your last sentence Redis 
>>>>>>>>>>>>> TTl is never
>>>>>>>>>>>>> used :)
>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 09/08/2014 02:09 PM, Rahul Amaram wrote:
>>>>>>>>>>>>>> I would be more than happy to give you guys a 
>>>>>>>>>>>>>> testimonial. However, we
>>>>>>>>>>>>>> have just taken this live and would like to see its 
>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>> before I
>>>>>>>>>>>>>> give a testimonial.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, if time permits, I'll try to bundle this for Debian 
>>>>>>>>>>>>>> (I'm a
>>>>>>>>>>>>>> Debian
>>>>>>>>>>>>>> maintainer). I can't commit on a timeline right away 
>>>>>>>>>>>>>> though :).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, just to make things explicitly clear. I understand 
>>>>>>>>>>>>>> that the
>>>>>>>>>>>>>> below
>>>>>>>>>>>>>> service item ttl has nothing to do with Redis TTL. But If 
>>>>>>>>>>>>>> I stop my
>>>>>>>>>>>>>> bischeck server for a day or two, then would any of my 
>>>>>>>>>>>>>> metrics get
>>>>>>>>>>>>>> lost?
>>>>>>>>>>>>>> Or would I have to increase th Redis TTL for this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Monday 08 September 2014 04:09 PM, Anders Håål wrote:
>>>>>>>>>>>>>>> Glad that it clarified how to configure the cache 
>>>>>>>>>>>>>>> section. I will
>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>> a blog post on this in the mean time, until we have a 
>>>>>>>>>>>>>>> updated
>>>>>>>>>>>>>>> documentation. I agree with you that the structure of the
>>>>>>>>>>>>>>> configuration is a bit "heavy", so ideas and input is 
>>>>>>>>>>>>>>> appreciated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regarding redis ttl, this is a redis feature we do not 
>>>>>>>>>>>>>>> use. The ttl
>>>>>>>>>>>>>>> mentioned in my mail is managed by bischeck. Redis ttl 
>>>>>>>>>>>>>>> on linked list
>>>>>>>>>>>>>>> do not work on individual nodes in a redis linked list.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Currently the bischeck installer should work for ubuntu,
>>>>>>>>>>>>>>> redhat/centos
>>>>>>>>>>>>>>> and debian. There is currently no plans to make 
>>>>>>>>>>>>>>> distribution packages
>>>>>>>>>>>>>>> like rpm or deb. I know op5 (www.op5.com) that bundles 
>>>>>>>>>>>>>>> Bischeck
>>>>>>>>>>>>>>> make a
>>>>>>>>>>>>>>> bischeck rpm. It would be super if there is any one that 
>>>>>>>>>>>>>>> like to do
>>>>>>>>>>>>>>> this for the project.
>>>>>>>>>>>>>>> When it comes to packaging we have done a bit of work to 
>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>> docker
>>>>>>>>>>>>>>> containers, but its still experimental.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I also encourage you, if you think bischeck support your 
>>>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>>> effort, to write a small testimony that we can put on 
>>>>>>>>>>>>>>> the site.
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 09/08/2014 11:30 AM, Rahul Amaram wrote:
>>>>>>>>>>>>>>>> Thanks Anders. This explains precisely why my data was 
>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>> purged
>>>>>>>>>>>>>>>> after 16 hours (30 values per hour * 1 hours = 480). It 
>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>> great
>>>>>>>>>>>>>>>> if you could update the documentation with this info. 
>>>>>>>>>>>>>>>> The entire
>>>>>>>>>>>>>>>> setup
>>>>>>>>>>>>>>>> and configuration itself takes time to get a hold on 
>>>>>>>>>>>>>>>> and detailed
>>>>>>>>>>>>>>>> documentation would be very helpful.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, another quick question? Right now, I believe the 
>>>>>>>>>>>>>>>> Redis TTL is
>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>> to 2000 seconds. Does this mean that if I don't receive 
>>>>>>>>>>>>>>>> data for a
>>>>>>>>>>>>>>>> particular serviceitem (or service or host) for a 2000 
>>>>>>>>>>>>>>>> seconds, the
>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>> related to it is lost?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, any plans for bundling this with distributions 
>>>>>>>>>>>>>>>> such as Debian?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Monday 08 September 2014 02:04 PM, Anders Håål wrote:
>>>>>>>>>>>>>>>>> Hi Rahul,
>>>>>>>>>>>>>>>>> Thanks for the question and feedback on the 
>>>>>>>>>>>>>>>>> documentation. Great to
>>>>>>>>>>>>>>>>> hear that you think Bischeck is awesome. If you do not
>>>>>>>>>>>>>>>>> understand how
>>>>>>>>>>>>>>>>> it works by reading the documentation you are probably 
>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>> alone, and
>>>>>>>>>>>>>>>>> we should consider it a documentation bug.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In 1.0.0 we introduce the concept that you asking 
>>>>>>>>>>>>>>>>> about and it
>>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>> two different independent features.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Lets start with cache purging.
>>>>>>>>>>>>>>>>> Collected monitoring data, metrics, are kept in the 
>>>>>>>>>>>>>>>>> cache (redis
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> 1.0.0) as a linked lists. There is one linked list per 
>>>>>>>>>>>>>>>>> service
>>>>>>>>>>>>>>>>> definition, like host1-service1-serviceitem1. Prior to 
>>>>>>>>>>>>>>>>> 1.0.0
>>>>>>>>>>>>>>>>> all the
>>>>>>>>>>>>>>>>> linked lists had the same size that was defined with 
>>>>>>>>>>>>>>>>> the property
>>>>>>>>>>>>>>>>> lastStatusCacheSize. But in 1.0.0 we made that 
>>>>>>>>>>>>>>>>> configurable so it
>>>>>>>>>>>>>>>>> could be defined per service definition.
>>>>>>>>>>>>>>>>> To enable individual cache configurations we added a 
>>>>>>>>>>>>>>>>> section called
>>>>>>>>>>>>>>>>> <cache> in the serviceitem section of the 
>>>>>>>>>>>>>>>>> bischeck.xml. Like many
>>>>>>>>>>>>>>>>> other configuration options in 1.0.0 the cache section 
>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>> have the
>>>>>>>>>>>>>>>>> specific values or point to a template that could be 
>>>>>>>>>>>>>>>>> shared.
>>>>>>>>>>>>>>>>> To manage the size of the cache , or to be more 
>>>>>>>>>>>>>>>>> specific the linked
>>>>>>>>>>>>>>>>> list size, we defined the <purge> section. The purge 
>>>>>>>>>>>>>>>>> section can
>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>> two different configurations. The first is defining 
>>>>>>>>>>>>>>>>> the max size of
>>>>>>>>>>>>>>>>> the cache linked list.
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>>   <purge>
>>>>>>>>>>>>>>>>> <maxcount>1000</maxcount>
>>>>>>>>>>>>>>>>>   </purge>
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The second options is to define the “time to live” for 
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> metrics in
>>>>>>>>>>>>>>>>> the cache.
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>>   <purge>
>>>>>>>>>>>>>>>>>    <offset>10</offset>
>>>>>>>>>>>>>>>>>    <period>D</period>
>>>>>>>>>>>>>>>>>   </purge>
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>> In the above example we set the time to live to 10 
>>>>>>>>>>>>>>>>> days. So any
>>>>>>>>>>>>>>>>> metrics older then this period will be removed. The 
>>>>>>>>>>>>>>>>> period can have
>>>>>>>>>>>>>>>>> the following values:
>>>>>>>>>>>>>>>>> H - hours
>>>>>>>>>>>>>>>>> D - days
>>>>>>>>>>>>>>>>> W - weeks
>>>>>>>>>>>>>>>>> Y - year
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The two option are mutual exclusive. You have to chose 
>>>>>>>>>>>>>>>>> one for each
>>>>>>>>>>>>>>>>> serviceitem or cache template.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If no cache directive is define for a serviceitem the 
>>>>>>>>>>>>>>>>> property
>>>>>>>>>>>>>>>>> lastStatusCacheSize will be used. It's default value 
>>>>>>>>>>>>>>>>> is 500.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hopefully this explains the cache purging.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The next question was related to aggregations which 
>>>>>>>>>>>>>>>>> has nothing
>>>>>>>>>>>>>>>>> to do
>>>>>>>>>>>>>>>>> with purging, but it's configured in the same <cache> 
>>>>>>>>>>>>>>>>> section. The
>>>>>>>>>>>>>>>>> idea with aggregations was to create an automatic way 
>>>>>>>>>>>>>>>>> to aggregate
>>>>>>>>>>>>>>>>> metrics on the level of an hour, day, week and month. The
>>>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>>>> functions current supported is average, max and min.
>>>>>>>>>>>>>>>>> Lets say you have a service definition of the format
>>>>>>>>>>>>>>>>> host1-service1-serviceitem1. When you enable an 
>>>>>>>>>>>>>>>>> average (avg)
>>>>>>>>>>>>>>>>> aggregation you will automatically get the following 
>>>>>>>>>>>>>>>>> new service
>>>>>>>>>>>>>>>>> definitions
>>>>>>>>>>>>>>>>> host1-service1/H/avg-serviceitem1
>>>>>>>>>>>>>>>>> host1-service1/D/avg-serviceitem1
>>>>>>>>>>>>>>>>> host1-service1/W/avg-serviceitem1
>>>>>>>>>>>>>>>>> host1-service1/M/avg-serviceitem1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The configuration you need to achive the above average
>>>>>>>>>>>>>>>>> aggregations is:
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If you like to combine it with the above descibed 
>>>>>>>>>>>>>>>>> purging your
>>>>>>>>>>>>>>>>> configuration would look like:
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   <purge>
>>>>>>>>>>>>>>>>>    <offset>10</offset>
>>>>>>>>>>>>>>>>>    <period>D</period>
>>>>>>>>>>>>>>>>>   </purge>
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The new aggregated service definitions,
>>>>>>>>>>>>>>>>> host1-service1/H/avg-serviceitem1, etc, will have 
>>>>>>>>>>>>>>>>> their own cache
>>>>>>>>>>>>>>>>> entries and can be used in threshold configurations 
>>>>>>>>>>>>>>>>> and virtual
>>>>>>>>>>>>>>>>> services like any other service definitions. For 
>>>>>>>>>>>>>>>>> example in a
>>>>>>>>>>>>>>>>> threshold hours section we could define
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <hours hoursID="2">
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   <hourinterval>
>>>>>>>>>>>>>>>>>     <from>09:00</from>
>>>>>>>>>>>>>>>>>     <to>12:00</to>
>>>>>>>>>>>>>>>>> <threshold>host1-service1/H/avg-serviceitem1[0]*0.8</threshold> 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   </hourinterval>
>>>>>>>>>>>>>>>>>   ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This would mean that we use the average value for
>>>>>>>>>>>>>>>>> host1-service1-serviceitem1  for the period of the 
>>>>>>>>>>>>>>>>> last hour.
>>>>>>>>>>>>>>>>> Aggregations are calculated hourly, daily, weekly and 
>>>>>>>>>>>>>>>>> monthly.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> By default weekends metrics are not included in the 
>>>>>>>>>>>>>>>>> aggrgation
>>>>>>>>>>>>>>>>> calculation. This can be enabled by setting the
>>>>>>>>>>>>>>>>> <useweekend>true</useweekend>:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>>>   ….
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This will create aggregated service definitions with 
>>>>>>>>>>>>>>>>> the following
>>>>>>>>>>>>>>>>> name standard:
>>>>>>>>>>>>>>>>> host1-service1/H/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>>> host1-service1/D/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>>> host1-service1/W/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>>> host1-service1/M/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You can also have multiple entries like:
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>>> <method>max</method>
>>>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>>>   ….
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So how long time will the aggregated values be kept in 
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> cache? By
>>>>>>>>>>>>>>>>> default we save
>>>>>>>>>>>>>>>>> Hour aggregation for 25 hours
>>>>>>>>>>>>>>>>> Daily aggregations for 7 days
>>>>>>>>>>>>>>>>> Weekly aggregations for 5 weeks
>>>>>>>>>>>>>>>>> Monthly aggregations for 1 month
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> These values can be override but they can not be lower 
>>>>>>>>>>>>>>>>> then the
>>>>>>>>>>>>>>>>> default. Below you have an example where we save the 
>>>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> 168 hours, 60 days and 53 weeks.
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>>>>     <retention>
>>>>>>>>>>>>>>>>> <period>H</period>
>>>>>>>>>>>>>>>>> <offset>168</offset>
>>>>>>>>>>>>>>>>>     </retention>
>>>>>>>>>>>>>>>>>     <retention>
>>>>>>>>>>>>>>>>> <period>D</period>
>>>>>>>>>>>>>>>>> <offset>60</offset>
>>>>>>>>>>>>>>>>>     </retention>
>>>>>>>>>>>>>>>>>     <retention>
>>>>>>>>>>>>>>>>> <period>W</period>
>>>>>>>>>>>>>>>>> <offset>53</offset>
>>>>>>>>>>>>>>>>>     </retention>
>>>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>>>>   ….
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I hope this makes it a bit less confusing. What is 
>>>>>>>>>>>>>>>>> clear to me is
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> we need to improve the documentation in this area.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 09/08/2014 06:02 AM, Rahul Amaram wrote:
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>> I am trying to setup the bischeck plugin for our 
>>>>>>>>>>>>>>>>>> organization. I
>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>> configured most part of it except for the cache 
>>>>>>>>>>>>>>>>>> retention period.
>>>>>>>>>>>>>>>>>> Here
>>>>>>>>>>>>>>>>>> is what I want - I want to store every value which 
>>>>>>>>>>>>>>>>>> has been
>>>>>>>>>>>>>>>>>> generated
>>>>>>>>>>>>>>>>>> during the past 1 month. The reason being my 
>>>>>>>>>>>>>>>>>> threshold is
>>>>>>>>>>>>>>>>>> currently
>>>>>>>>>>>>>>>>>> calculated as the average of the metric value during 
>>>>>>>>>>>>>>>>>> the past 4
>>>>>>>>>>>>>>>>>> weeks at
>>>>>>>>>>>>>>>>>> the same time of the day.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So, how do I define the cache template for this? If I 
>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>> define any
>>>>>>>>>>>>>>>>>> cache template, for how many days is the data kept?
>>>>>>>>>>>>>>>>>> Also, how does the aggregrate function work and and 
>>>>>>>>>>>>>>>>>> what does the
>>>>>>>>>>>>>>>>>> purge
>>>>>>>>>>>>>>>>>> Maxitems signify?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've gone through the documentation but it wasn't 
>>>>>>>>>>>>>>>>>> clear. Looking
>>>>>>>>>>>>>>>>>> forward
>>>>>>>>>>>>>>>>>> to a response.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Bischeck is one awesome plugin. Keep up the great work.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>


--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20140912/a8b5d8b0/attachment.html>
Previous message: Specifying the retention period
Next message: High CPU consumption by java and redis-server
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Bischeck-users mailing list