Specifying the retention period

Rahul Amaram rahul.amaram at vizury.com
Thu Sep 11 23:39:41 CEST 2014


Ok. I figured out the problem. It was with my understanding. I have 
weekend to be true. So, instead of 
$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23], I should be 
using $$HOSTNAME$$-$$SERVICENAME$$/H/avg/weekend-$$SERVICEITEMNAME$$[23] 
and so on.

Thanks for the awesome support.

- Rahul.

On Thursday 11 September 2014 11:43 AM, Anders Håål wrote:
> Hi Rahul,
> Now I have a backlog of questions :)
> Okay lets start with the last question.
> - First verify that you have data in the cahe. User redis-cli or the 
> Bischeck CacheCli, 
> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4.
> - Then there is an issue with null data. Lets say that one of the 
> expressions you have return null. Null is tricky so in Bischeck you 
> have to decide how to manage a null value. Look at 
> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Section-4.3.
> - You can also check the logs and also increase the loglevel to debug 
> to get more info. Check out 
> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-3.2. 
>
>
> The two following questions I will try to clarify better later, must 
> run into a meeting, but the index on hour specify an specific hour, 
> like the avg, max or min for that hour. Index 0 means the last 
> calculated hour so if time is 2:30 index 0 means the avg, max or min 
> for the period 1:00 to 2:00.
>
> These are good question, we are glad that get your users perspective 
> on this.
> Anders
>
> On 09/11/2014 07:19 AM, Rahul Amaram wrote:
>> This doesn't help :(.
>>
>> <threshold>avg($$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[167],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[335])</threshold> 
>>
>>
>> - Rahul.
>>
>> On Thursday 11 September 2014 10:45 AM, Rahul Amaram wrote:
>>> Also, let us say, that the current time is 2.30 and that I want the 
>>> average of all the values between 2.00 and 3.00 the previous day, 
>>> I'd probably have to use
>>>
>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23]
>>>
>>> rather than
>>>
>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[24]
>>>
>>> Am I right ?
>>>
>>> Thanks,
>>> Rahul.
>>>
>>> On Thursday 11 September 2014 10:39 AM, Rahul Amaram wrote:
>>>> Ok. So would 
>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[24] refer to 
>>>> the average of the all the values ONLY in the 24th hour before the 
>>>> current time?
>>>>
>>>> On Thursday 11 September 2014 10:30 AM, Anders Håål wrote:
>>>>> Hi Amaram,
>>>>> I think you just need to remove the minus sign when using the 
>>>>> aggregated. Minus is used for time, like back in time, and just a 
>>>>> integer without minus and a time indicator is an index. Check out 
>>>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Chapter-4. 
>>>>>
>>>>> You can also use redis-cli to explore the data in the cache. The 
>>>>> key in the redis is the same as the service definition.
>>>>> Anders
>>>>>
>>>>> On 09/11/2014 06:38 AM, Rahul Amaram wrote:
>>>>>> Ok. I am facing another issue. I have been running bischeck with 
>>>>>> the aggregate function for more than a day. I am using the below 
>>>>>> threshold function.
>>>>>>
>>>>>> <threshold>avg($$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-24],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-168],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-336])</threshold> 
>>>>>>
>>>>>>
>>>>>> and it doesn't seem to work. I am expecting that the first 
>>>>>> aggregate value should be available.
>>>>>>
>>>>>> Instead if I use the below threshold function (I know this is not 
>>>>>> related to aggregate)
>>>>>>
>>>>>> avg($$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-24H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-168H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-336H]) 
>>>>>>
>>>>>>
>>>>>> the threshold is calcuated fine, which is just the first value as 
>>>>>> the remaining two values are not in cache.
>>>>>>
>>>>>> How can I debug why aggregate is not working?
>>>>>>
>>>>>> Thanks,
>>>>>> Rahul.
>>>>>>
>>>>>> On Wednesday 10 September 2014 04:53 PM, Anders Håål wrote:
>>>>>>> Thanks - got the ticket.
>>>>>>> I will update progress on the bug ticket, but its good that the 
>>>>>>> work around works.
>>>>>>> Anders
>>>>>>>
>>>>>>> On 09/10/2014 01:20 PM, Rahul Amaram wrote:
>>>>>>>> That indeed seems to be the problem. Using count rather than 
>>>>>>>> period
>>>>>>>> seems to address the issue. Raised a ticket -
>>>>>>>> http://gforge.ingby.com/gf/project/bischeck/tracker/?action=TrackerItemEdit&tracker_item_id=259 
>>>>>>>>
>>>>>>>> .
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Rahul.
>>>>>>>>
>>>>>>>> On Wednesday 10 September 2014 04:02 PM, Anders Håål wrote:
>>>>>>>>> This looks like a bug. Could you please report it on
>>>>>>>>> http://gforge.ingby.com/gf/project/bischeck/tracker/ in the Bugs
>>>>>>>>> tracker. You need a account but its just a sign up and you get an
>>>>>>>>> email confirmation.
>>>>>>>>> Can you try to use maxcount for purging instead as a work 
>>>>>>>>> around? Just
>>>>>>>>> calculate your maxcount based on the scheduling interval you use.
>>>>>>>>> Anders
>>>>>>>>>
>>>>>>>>> On 09/10/2014 12:17 PM, Rahul Amaram wrote:
>>>>>>>>>> Following up on the earlier topic, I am seeing the below 
>>>>>>>>>> errors related
>>>>>>>>>> to cache purge. Any idea on what might be causing this? I 
>>>>>>>>>> don't see any
>>>>>>>>>> other errors in log related to metrics.
>>>>>>>>>>
>>>>>>>>>> 2014-09-10 12:12:00.001 ; INFO ; 
>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob ; 
>>>>>>>>>> CachePurge
>>>>>>>>>> purging 180
>>>>>>>>>> 2014-09-10 12:12:00.003 ; INFO ; 
>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob ; 
>>>>>>>>>> CachePurge
>>>>>>>>>> executed in 1 ms
>>>>>>>>>> 2014-09-10 12:12:00.003 ; ERROR ; 
>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>> org.quartz.core.JobRunShell ; Job DailyMaintenance.CachePurge 
>>>>>>>>>> threw an
>>>>>>>>>> unhandled Exception: java.lang.NullPointerException: null
>>>>>>>>>>          at
>>>>>>>>>> com.ingby.socbox.bischeck.cache.provider.redis.LastStatusCache.trim(LastStatusCache.java:1250) 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>          at
>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob.execute(CachePurgeJob.java:140) 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-09-10 12:12:00.003 ; ERROR ; 
>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>> org.quartz.core.ErrorLogger ; Job 
>>>>>>>>>> (DailyMaintenance.CachePurge threw an
>>>>>>>>>> exception.org.quartz.SchedulerException: Job threw an unhandled
>>>>>>>>>> exception.
>>>>>>>>>>          at 
>>>>>>>>>> org.quartz.core.JobRunShell.run(JobRunShell.java:224)
>>>>>>>>>>          at
>>>>>>>>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Caused by: java.lang.NullPointerException: null
>>>>>>>>>>          at
>>>>>>>>>> com.ingby.socbox.bischeck.cache.provider.redis.LastStatusCache.trim(LastStatusCache.java:1250) 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>          at
>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob.execute(CachePurgeJob.java:140) 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is my cache configuration:
>>>>>>>>>>
>>>>>>>>>>      <cache>
>>>>>>>>>>        <aggregate>
>>>>>>>>>>          <method>avg</method>
>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>          <retention>
>>>>>>>>>>            <period>H</period>
>>>>>>>>>>            <offset>720</offset>
>>>>>>>>>>          </retention>
>>>>>>>>>>          <retention>
>>>>>>>>>>            <period>D</period>
>>>>>>>>>>            <offset>30</offset>
>>>>>>>>>>          </retention>
>>>>>>>>>>        </aggregate>
>>>>>>>>>>
>>>>>>>>>>        <purge>
>>>>>>>>>>          <offset>30</offset>
>>>>>>>>>>          <period>D</period>
>>>>>>>>>>        </purge>
>>>>>>>>>>      </cache>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Rahul.
>>>>>>>>>> On Monday 08 September 2014 08:39 PM, Anders Håål wrote:
>>>>>>>>>>> Great if you can make a debian package, and I understand 
>>>>>>>>>>> that you can
>>>>>>>>>>> not commit. The best thing would be integrated to our build 
>>>>>>>>>>> process
>>>>>>>>>>> where we use ant.
>>>>>>>>>>>
>>>>>>>>>>> if the purging is based on time then it could happen that 
>>>>>>>>>>> data is
>>>>>>>>>>> removed from the cache since the logic is based on time 
>>>>>>>>>>> relative to
>>>>>>>>>>> now. To avoid it you should increase the purge time before 
>>>>>>>>>>> you start
>>>>>>>>>>> bischeck. And just a comment on your last sentence Redis TTl 
>>>>>>>>>>> is never
>>>>>>>>>>> used :)
>>>>>>>>>>> Anders
>>>>>>>>>>>
>>>>>>>>>>> On 09/08/2014 02:09 PM, Rahul Amaram wrote:
>>>>>>>>>>>> I would be more than happy to give you guys a testimonial. 
>>>>>>>>>>>> However, we
>>>>>>>>>>>> have just taken this live and would like to see its 
>>>>>>>>>>>> performance
>>>>>>>>>>>> before I
>>>>>>>>>>>> give a testimonial.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, if time permits, I'll try to bundle this for Debian 
>>>>>>>>>>>> (I'm a
>>>>>>>>>>>> Debian
>>>>>>>>>>>> maintainer). I can't commit on a timeline right away though 
>>>>>>>>>>>> :).
>>>>>>>>>>>>
>>>>>>>>>>>> Also, just to make things explicitly clear. I understand 
>>>>>>>>>>>> that the
>>>>>>>>>>>> below
>>>>>>>>>>>> service item ttl has nothing to do with Redis TTL. But If I 
>>>>>>>>>>>> stop my
>>>>>>>>>>>> bischeck server for a day or two, then would any of my 
>>>>>>>>>>>> metrics get
>>>>>>>>>>>> lost?
>>>>>>>>>>>> Or would I have to increase th Redis TTL for this.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>
>>>>>>>>>>>> On Monday 08 September 2014 04:09 PM, Anders Håål wrote:
>>>>>>>>>>>>> Glad that it clarified how to configure the cache section. 
>>>>>>>>>>>>> I will
>>>>>>>>>>>>> make
>>>>>>>>>>>>> a blog post on this in the mean time, until we have a updated
>>>>>>>>>>>>> documentation. I agree with you that the structure of the
>>>>>>>>>>>>> configuration is a bit "heavy", so ideas and input is 
>>>>>>>>>>>>> appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regarding redis ttl, this is a redis feature we do not 
>>>>>>>>>>>>> use. The ttl
>>>>>>>>>>>>> mentioned in my mail is managed by bischeck. Redis ttl on 
>>>>>>>>>>>>> linked list
>>>>>>>>>>>>> do not work on individual nodes in a redis linked list.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Currently the bischeck installer should work for ubuntu,
>>>>>>>>>>>>> redhat/centos
>>>>>>>>>>>>> and debian. There is currently no plans to make 
>>>>>>>>>>>>> distribution packages
>>>>>>>>>>>>> like rpm or deb. I know op5 (www.op5.com) that bundles 
>>>>>>>>>>>>> Bischeck
>>>>>>>>>>>>> make a
>>>>>>>>>>>>> bischeck rpm. It would be super if there is any one that 
>>>>>>>>>>>>> like to do
>>>>>>>>>>>>> this for the project.
>>>>>>>>>>>>> When it comes to packaging we have done a bit of work to 
>>>>>>>>>>>>> create
>>>>>>>>>>>>> docker
>>>>>>>>>>>>> containers, but its still experimental.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I also encourage you, if you think bischeck support your 
>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>> effort, to write a small testimony that we can put on the 
>>>>>>>>>>>>> site.
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 09/08/2014 11:30 AM, Rahul Amaram wrote:
>>>>>>>>>>>>>> Thanks Anders. This explains precisely why my data was 
>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>> purged
>>>>>>>>>>>>>> after 16 hours (30 values per hour * 1 hours = 480). It 
>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>> great
>>>>>>>>>>>>>> if you could update the documentation with this info. The 
>>>>>>>>>>>>>> entire
>>>>>>>>>>>>>> setup
>>>>>>>>>>>>>> and configuration itself takes time to get a hold on and 
>>>>>>>>>>>>>> detailed
>>>>>>>>>>>>>> documentation would be very helpful.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, another quick question? Right now, I believe the 
>>>>>>>>>>>>>> Redis TTL is
>>>>>>>>>>>>>> set
>>>>>>>>>>>>>> to 2000 seconds. Does this mean that if I don't receive 
>>>>>>>>>>>>>> data for a
>>>>>>>>>>>>>> particular serviceitem (or service or host) for a 2000 
>>>>>>>>>>>>>> seconds, the
>>>>>>>>>>>>>> data
>>>>>>>>>>>>>> related to it is lost?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, any plans for bundling this with distributions such 
>>>>>>>>>>>>>> as Debian?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Monday 08 September 2014 02:04 PM, Anders Håål wrote:
>>>>>>>>>>>>>>> Hi Rahul,
>>>>>>>>>>>>>>> Thanks for the question and feedback on the 
>>>>>>>>>>>>>>> documentation. Great to
>>>>>>>>>>>>>>> hear that you think Bischeck is awesome. If you do not
>>>>>>>>>>>>>>> understand how
>>>>>>>>>>>>>>> it works by reading the documentation you are probably not
>>>>>>>>>>>>>>> alone, and
>>>>>>>>>>>>>>> we should consider it a documentation bug.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In 1.0.0 we introduce the concept that you asking about 
>>>>>>>>>>>>>>> and it
>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>> two different independent features.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Lets start with cache purging.
>>>>>>>>>>>>>>> Collected monitoring data, metrics, are kept in the 
>>>>>>>>>>>>>>> cache (redis
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> 1.0.0) as a linked lists. There is one linked list per 
>>>>>>>>>>>>>>> service
>>>>>>>>>>>>>>> definition, like host1-service1-serviceitem1. Prior to 
>>>>>>>>>>>>>>> 1.0.0
>>>>>>>>>>>>>>> all the
>>>>>>>>>>>>>>> linked lists had the same size that was defined with the 
>>>>>>>>>>>>>>> property
>>>>>>>>>>>>>>> lastStatusCacheSize. But in 1.0.0 we made that 
>>>>>>>>>>>>>>> configurable so it
>>>>>>>>>>>>>>> could be defined per service definition.
>>>>>>>>>>>>>>> To enable individual cache configurations we added a 
>>>>>>>>>>>>>>> section called
>>>>>>>>>>>>>>> <cache> in the serviceitem section of the bischeck.xml. 
>>>>>>>>>>>>>>> Like many
>>>>>>>>>>>>>>> other configuration options in 1.0.0 the cache section 
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>> have the
>>>>>>>>>>>>>>> specific values or point to a template that could be 
>>>>>>>>>>>>>>> shared.
>>>>>>>>>>>>>>> To manage the size of the cache , or to be more specific 
>>>>>>>>>>>>>>> the linked
>>>>>>>>>>>>>>> list size, we defined the <purge> section. The purge 
>>>>>>>>>>>>>>> section can
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> two different configurations. The first is defining the 
>>>>>>>>>>>>>>> max size of
>>>>>>>>>>>>>>> the cache linked list.
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>   <purge>
>>>>>>>>>>>>>>> <maxcount>1000</maxcount>
>>>>>>>>>>>>>>>   </purge>
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The second options is to define the “time to live” for the
>>>>>>>>>>>>>>> metrics in
>>>>>>>>>>>>>>> the cache.
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>   <purge>
>>>>>>>>>>>>>>>    <offset>10</offset>
>>>>>>>>>>>>>>>    <period>D</period>
>>>>>>>>>>>>>>>   </purge>
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>> In the above example we set the time to live to 10 days. 
>>>>>>>>>>>>>>> So any
>>>>>>>>>>>>>>> metrics older then this period will be removed. The 
>>>>>>>>>>>>>>> period can have
>>>>>>>>>>>>>>> the following values:
>>>>>>>>>>>>>>> H - hours
>>>>>>>>>>>>>>> D - days
>>>>>>>>>>>>>>> W - weeks
>>>>>>>>>>>>>>> Y - year
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The two option are mutual exclusive. You have to chose 
>>>>>>>>>>>>>>> one for each
>>>>>>>>>>>>>>> serviceitem or cache template.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If no cache directive is define for a serviceitem the 
>>>>>>>>>>>>>>> property
>>>>>>>>>>>>>>> lastStatusCacheSize will be used. It's default value is 
>>>>>>>>>>>>>>> 500.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hopefully this explains the cache purging.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The next question was related to aggregations which has 
>>>>>>>>>>>>>>> nothing
>>>>>>>>>>>>>>> to do
>>>>>>>>>>>>>>> with purging, but it's configured in the same <cache> 
>>>>>>>>>>>>>>> section. The
>>>>>>>>>>>>>>> idea with aggregations was to create an automatic way to 
>>>>>>>>>>>>>>> aggregate
>>>>>>>>>>>>>>> metrics on the level of an hour, day, week and month. The
>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>> functions current supported is average, max and min.
>>>>>>>>>>>>>>> Lets say you have a service definition of the format
>>>>>>>>>>>>>>> host1-service1-serviceitem1. When you enable an average 
>>>>>>>>>>>>>>> (avg)
>>>>>>>>>>>>>>> aggregation you will automatically get the following new 
>>>>>>>>>>>>>>> service
>>>>>>>>>>>>>>> definitions
>>>>>>>>>>>>>>> host1-service1/H/avg-serviceitem1
>>>>>>>>>>>>>>> host1-service1/D/avg-serviceitem1
>>>>>>>>>>>>>>> host1-service1/W/avg-serviceitem1
>>>>>>>>>>>>>>> host1-service1/M/avg-serviceitem1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The configuration you need to achive the above average
>>>>>>>>>>>>>>> aggregations is:
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>     <method>avg</method>
>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you like to combine it with the above descibed 
>>>>>>>>>>>>>>> purging your
>>>>>>>>>>>>>>> configuration would look like:
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>     <method>avg</method>
>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   <purge>
>>>>>>>>>>>>>>>    <offset>10</offset>
>>>>>>>>>>>>>>>    <period>D</period>
>>>>>>>>>>>>>>>   </purge>
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The new aggregated service definitions,
>>>>>>>>>>>>>>> host1-service1/H/avg-serviceitem1, etc, will have their 
>>>>>>>>>>>>>>> own cache
>>>>>>>>>>>>>>> entries and can be used in threshold configurations and 
>>>>>>>>>>>>>>> virtual
>>>>>>>>>>>>>>> services like any other service definitions. For example 
>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>> threshold hours section we could define
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <hours hoursID="2">
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   <hourinterval>
>>>>>>>>>>>>>>>     <from>09:00</from>
>>>>>>>>>>>>>>>     <to>12:00</to>
>>>>>>>>>>>>>>> <threshold>host1-service1/H/avg-serviceitem1[0]*0.8</threshold> 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   </hourinterval>
>>>>>>>>>>>>>>>   ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This would mean that we use the average value for
>>>>>>>>>>>>>>> host1-service1-serviceitem1  for the period of the last 
>>>>>>>>>>>>>>> hour.
>>>>>>>>>>>>>>> Aggregations are calculated hourly, daily, weekly and 
>>>>>>>>>>>>>>> monthly.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> By default weekends metrics are not included in the 
>>>>>>>>>>>>>>> aggrgation
>>>>>>>>>>>>>>> calculation. This can be enabled by setting the
>>>>>>>>>>>>>>> <useweekend>true</useweekend>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>     <method>avg</method>
>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>   ….
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This will create aggregated service definitions with the 
>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>> name standard:
>>>>>>>>>>>>>>> host1-service1/H/avg/weekend-serviceitem1
>>>>>>>>>>>>>>> host1-service1/D/avg/weekend-serviceitem1
>>>>>>>>>>>>>>> host1-service1/W/avg/weekend-serviceitem1
>>>>>>>>>>>>>>> host1-service1/M/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You can also have multiple entries like:
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>     <method>avg</method>
>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>     <method>max</method>
>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>   ….
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So how long time will the aggregated values be kept in the
>>>>>>>>>>>>>>> cache? By
>>>>>>>>>>>>>>> default we save
>>>>>>>>>>>>>>> Hour aggregation for 25 hours
>>>>>>>>>>>>>>> Daily aggregations for 7 days
>>>>>>>>>>>>>>> Weekly aggregations for 5 weeks
>>>>>>>>>>>>>>> Monthly aggregations for 1 month
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> These values can be override but they can not be lower 
>>>>>>>>>>>>>>> then the
>>>>>>>>>>>>>>> default. Below you have an example where we save the 
>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> 168 hours, 60 days and 53 weeks.
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>     <method>avg</method>
>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>>     <retention>
>>>>>>>>>>>>>>>       <period>H</period>
>>>>>>>>>>>>>>>       <offset>168</offset>
>>>>>>>>>>>>>>>     </retention>
>>>>>>>>>>>>>>>     <retention>
>>>>>>>>>>>>>>>      <period>D</period>
>>>>>>>>>>>>>>>       <offset>60</offset>
>>>>>>>>>>>>>>>     </retention>
>>>>>>>>>>>>>>>     <retention>
>>>>>>>>>>>>>>>       <period>W</period>
>>>>>>>>>>>>>>>       <offset>53</offset>
>>>>>>>>>>>>>>>     </retention>
>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>>   ….
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I hope this makes it a bit less confusing. What is clear 
>>>>>>>>>>>>>>> to me is
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> we need to improve the documentation in this area.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 09/08/2014 06:02 AM, Rahul Amaram wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> I am trying to setup the bischeck plugin for our 
>>>>>>>>>>>>>>>> organization. I
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> configured most part of it except for the cache 
>>>>>>>>>>>>>>>> retention period.
>>>>>>>>>>>>>>>> Here
>>>>>>>>>>>>>>>> is what I want - I want to store every value which has 
>>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>> generated
>>>>>>>>>>>>>>>> during the past 1 month. The reason being my threshold is
>>>>>>>>>>>>>>>> currently
>>>>>>>>>>>>>>>> calculated as the average of the metric value during 
>>>>>>>>>>>>>>>> the past 4
>>>>>>>>>>>>>>>> weeks at
>>>>>>>>>>>>>>>> the same time of the day.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So, how do I define the cache template for this? If I 
>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>> define any
>>>>>>>>>>>>>>>> cache template, for how many days is the data kept?
>>>>>>>>>>>>>>>> Also, how does the aggregrate function work and and 
>>>>>>>>>>>>>>>> what does the
>>>>>>>>>>>>>>>> purge
>>>>>>>>>>>>>>>> Maxitems signify?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've gone through the documentation but it wasn't 
>>>>>>>>>>>>>>>> clear. Looking
>>>>>>>>>>>>>>>> forward
>>>>>>>>>>>>>>>> to a response.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bischeck is one awesome plugin. Keep up the great work.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>


-- 



More information about the Bischeck-users mailing list