Specifying the retention period
Rahul Amaram
rahul.amaram at vizury.com
Thu Sep 11 23:39:41 CEST 2014
Ok. I figured out the problem. It was with my understanding. I have
weekend to be true. So, instead of
$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23], I should be
using $$HOSTNAME$$-$$SERVICENAME$$/H/avg/weekend-$$SERVICEITEMNAME$$[23]
and so on.
Thanks for the awesome support.
- Rahul.
On Thursday 11 September 2014 11:43 AM, Anders Håål wrote:
> Hi Rahul,
> Now I have a backlog of questions :)
> Okay lets start with the last question.
> - First verify that you have data in the cahe. User redis-cli or the
> Bischeck CacheCli,
> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4.
> - Then there is an issue with null data. Lets say that one of the
> expressions you have return null. Null is tricky so in Bischeck you
> have to decide how to manage a null value. Look at
> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Section-4.3.
> - You can also check the logs and also increase the loglevel to debug
> to get more info. Check out
> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-3.2.
>
>
> The two following questions I will try to clarify better later, must
> run into a meeting, but the index on hour specify an specific hour,
> like the avg, max or min for that hour. Index 0 means the last
> calculated hour so if time is 2:30 index 0 means the avg, max or min
> for the period 1:00 to 2:00.
>
> These are good question, we are glad that get your users perspective
> on this.
> Anders
>
> On 09/11/2014 07:19 AM, Rahul Amaram wrote:
>> This doesn't help :(.
>>
>> <threshold>avg($$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[167],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[335])</threshold>
>>
>>
>> - Rahul.
>>
>> On Thursday 11 September 2014 10:45 AM, Rahul Amaram wrote:
>>> Also, let us say, that the current time is 2.30 and that I want the
>>> average of all the values between 2.00 and 3.00 the previous day,
>>> I'd probably have to use
>>>
>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23]
>>>
>>> rather than
>>>
>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[24]
>>>
>>> Am I right ?
>>>
>>> Thanks,
>>> Rahul.
>>>
>>> On Thursday 11 September 2014 10:39 AM, Rahul Amaram wrote:
>>>> Ok. So would
>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[24] refer to
>>>> the average of the all the values ONLY in the 24th hour before the
>>>> current time?
>>>>
>>>> On Thursday 11 September 2014 10:30 AM, Anders Håål wrote:
>>>>> Hi Amaram,
>>>>> I think you just need to remove the minus sign when using the
>>>>> aggregated. Minus is used for time, like back in time, and just a
>>>>> integer without minus and a time indicator is an index. Check out
>>>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Chapter-4.
>>>>>
>>>>> You can also use redis-cli to explore the data in the cache. The
>>>>> key in the redis is the same as the service definition.
>>>>> Anders
>>>>>
>>>>> On 09/11/2014 06:38 AM, Rahul Amaram wrote:
>>>>>> Ok. I am facing another issue. I have been running bischeck with
>>>>>> the aggregate function for more than a day. I am using the below
>>>>>> threshold function.
>>>>>>
>>>>>> <threshold>avg($$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-24],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-168],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-336])</threshold>
>>>>>>
>>>>>>
>>>>>> and it doesn't seem to work. I am expecting that the first
>>>>>> aggregate value should be available.
>>>>>>
>>>>>> Instead if I use the below threshold function (I know this is not
>>>>>> related to aggregate)
>>>>>>
>>>>>> avg($$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-24H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-168H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-336H])
>>>>>>
>>>>>>
>>>>>> the threshold is calcuated fine, which is just the first value as
>>>>>> the remaining two values are not in cache.
>>>>>>
>>>>>> How can I debug why aggregate is not working?
>>>>>>
>>>>>> Thanks,
>>>>>> Rahul.
>>>>>>
>>>>>> On Wednesday 10 September 2014 04:53 PM, Anders Håål wrote:
>>>>>>> Thanks - got the ticket.
>>>>>>> I will update progress on the bug ticket, but its good that the
>>>>>>> work around works.
>>>>>>> Anders
>>>>>>>
>>>>>>> On 09/10/2014 01:20 PM, Rahul Amaram wrote:
>>>>>>>> That indeed seems to be the problem. Using count rather than
>>>>>>>> period
>>>>>>>> seems to address the issue. Raised a ticket -
>>>>>>>> http://gforge.ingby.com/gf/project/bischeck/tracker/?action=TrackerItemEdit&tracker_item_id=259
>>>>>>>>
>>>>>>>> .
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Rahul.
>>>>>>>>
>>>>>>>> On Wednesday 10 September 2014 04:02 PM, Anders Håål wrote:
>>>>>>>>> This looks like a bug. Could you please report it on
>>>>>>>>> http://gforge.ingby.com/gf/project/bischeck/tracker/ in the Bugs
>>>>>>>>> tracker. You need a account but its just a sign up and you get an
>>>>>>>>> email confirmation.
>>>>>>>>> Can you try to use maxcount for purging instead as a work
>>>>>>>>> around? Just
>>>>>>>>> calculate your maxcount based on the scheduling interval you use.
>>>>>>>>> Anders
>>>>>>>>>
>>>>>>>>> On 09/10/2014 12:17 PM, Rahul Amaram wrote:
>>>>>>>>>> Following up on the earlier topic, I am seeing the below
>>>>>>>>>> errors related
>>>>>>>>>> to cache purge. Any idea on what might be causing this? I
>>>>>>>>>> don't see any
>>>>>>>>>> other errors in log related to metrics.
>>>>>>>>>>
>>>>>>>>>> 2014-09-10 12:12:00.001 ; INFO ;
>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob ;
>>>>>>>>>> CachePurge
>>>>>>>>>> purging 180
>>>>>>>>>> 2014-09-10 12:12:00.003 ; INFO ;
>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob ;
>>>>>>>>>> CachePurge
>>>>>>>>>> executed in 1 ms
>>>>>>>>>> 2014-09-10 12:12:00.003 ; ERROR ;
>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>> org.quartz.core.JobRunShell ; Job DailyMaintenance.CachePurge
>>>>>>>>>> threw an
>>>>>>>>>> unhandled Exception: java.lang.NullPointerException: null
>>>>>>>>>> at
>>>>>>>>>> com.ingby.socbox.bischeck.cache.provider.redis.LastStatusCache.trim(LastStatusCache.java:1250)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> at
>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob.execute(CachePurgeJob.java:140)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-09-10 12:12:00.003 ; ERROR ;
>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>> org.quartz.core.ErrorLogger ; Job
>>>>>>>>>> (DailyMaintenance.CachePurge threw an
>>>>>>>>>> exception.org.quartz.SchedulerException: Job threw an unhandled
>>>>>>>>>> exception.
>>>>>>>>>> at
>>>>>>>>>> org.quartz.core.JobRunShell.run(JobRunShell.java:224)
>>>>>>>>>> at
>>>>>>>>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Caused by: java.lang.NullPointerException: null
>>>>>>>>>> at
>>>>>>>>>> com.ingby.socbox.bischeck.cache.provider.redis.LastStatusCache.trim(LastStatusCache.java:1250)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> at
>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob.execute(CachePurgeJob.java:140)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is my cache configuration:
>>>>>>>>>>
>>>>>>>>>> <cache>
>>>>>>>>>> <aggregate>
>>>>>>>>>> <method>avg</method>
>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>> <retention>
>>>>>>>>>> <period>H</period>
>>>>>>>>>> <offset>720</offset>
>>>>>>>>>> </retention>
>>>>>>>>>> <retention>
>>>>>>>>>> <period>D</period>
>>>>>>>>>> <offset>30</offset>
>>>>>>>>>> </retention>
>>>>>>>>>> </aggregate>
>>>>>>>>>>
>>>>>>>>>> <purge>
>>>>>>>>>> <offset>30</offset>
>>>>>>>>>> <period>D</period>
>>>>>>>>>> </purge>
>>>>>>>>>> </cache>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Rahul.
>>>>>>>>>> On Monday 08 September 2014 08:39 PM, Anders Håål wrote:
>>>>>>>>>>> Great if you can make a debian package, and I understand
>>>>>>>>>>> that you can
>>>>>>>>>>> not commit. The best thing would be integrated to our build
>>>>>>>>>>> process
>>>>>>>>>>> where we use ant.
>>>>>>>>>>>
>>>>>>>>>>> if the purging is based on time then it could happen that
>>>>>>>>>>> data is
>>>>>>>>>>> removed from the cache since the logic is based on time
>>>>>>>>>>> relative to
>>>>>>>>>>> now. To avoid it you should increase the purge time before
>>>>>>>>>>> you start
>>>>>>>>>>> bischeck. And just a comment on your last sentence Redis TTl
>>>>>>>>>>> is never
>>>>>>>>>>> used :)
>>>>>>>>>>> Anders
>>>>>>>>>>>
>>>>>>>>>>> On 09/08/2014 02:09 PM, Rahul Amaram wrote:
>>>>>>>>>>>> I would be more than happy to give you guys a testimonial.
>>>>>>>>>>>> However, we
>>>>>>>>>>>> have just taken this live and would like to see its
>>>>>>>>>>>> performance
>>>>>>>>>>>> before I
>>>>>>>>>>>> give a testimonial.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, if time permits, I'll try to bundle this for Debian
>>>>>>>>>>>> (I'm a
>>>>>>>>>>>> Debian
>>>>>>>>>>>> maintainer). I can't commit on a timeline right away though
>>>>>>>>>>>> :).
>>>>>>>>>>>>
>>>>>>>>>>>> Also, just to make things explicitly clear. I understand
>>>>>>>>>>>> that the
>>>>>>>>>>>> below
>>>>>>>>>>>> service item ttl has nothing to do with Redis TTL. But If I
>>>>>>>>>>>> stop my
>>>>>>>>>>>> bischeck server for a day or two, then would any of my
>>>>>>>>>>>> metrics get
>>>>>>>>>>>> lost?
>>>>>>>>>>>> Or would I have to increase th Redis TTL for this.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>
>>>>>>>>>>>> On Monday 08 September 2014 04:09 PM, Anders Håål wrote:
>>>>>>>>>>>>> Glad that it clarified how to configure the cache section.
>>>>>>>>>>>>> I will
>>>>>>>>>>>>> make
>>>>>>>>>>>>> a blog post on this in the mean time, until we have a updated
>>>>>>>>>>>>> documentation. I agree with you that the structure of the
>>>>>>>>>>>>> configuration is a bit "heavy", so ideas and input is
>>>>>>>>>>>>> appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regarding redis ttl, this is a redis feature we do not
>>>>>>>>>>>>> use. The ttl
>>>>>>>>>>>>> mentioned in my mail is managed by bischeck. Redis ttl on
>>>>>>>>>>>>> linked list
>>>>>>>>>>>>> do not work on individual nodes in a redis linked list.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Currently the bischeck installer should work for ubuntu,
>>>>>>>>>>>>> redhat/centos
>>>>>>>>>>>>> and debian. There is currently no plans to make
>>>>>>>>>>>>> distribution packages
>>>>>>>>>>>>> like rpm or deb. I know op5 (www.op5.com) that bundles
>>>>>>>>>>>>> Bischeck
>>>>>>>>>>>>> make a
>>>>>>>>>>>>> bischeck rpm. It would be super if there is any one that
>>>>>>>>>>>>> like to do
>>>>>>>>>>>>> this for the project.
>>>>>>>>>>>>> When it comes to packaging we have done a bit of work to
>>>>>>>>>>>>> create
>>>>>>>>>>>>> docker
>>>>>>>>>>>>> containers, but its still experimental.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I also encourage you, if you think bischeck support your
>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>> effort, to write a small testimony that we can put on the
>>>>>>>>>>>>> site.
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 09/08/2014 11:30 AM, Rahul Amaram wrote:
>>>>>>>>>>>>>> Thanks Anders. This explains precisely why my data was
>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>> purged
>>>>>>>>>>>>>> after 16 hours (30 values per hour * 1 hours = 480). It
>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>> great
>>>>>>>>>>>>>> if you could update the documentation with this info. The
>>>>>>>>>>>>>> entire
>>>>>>>>>>>>>> setup
>>>>>>>>>>>>>> and configuration itself takes time to get a hold on and
>>>>>>>>>>>>>> detailed
>>>>>>>>>>>>>> documentation would be very helpful.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, another quick question? Right now, I believe the
>>>>>>>>>>>>>> Redis TTL is
>>>>>>>>>>>>>> set
>>>>>>>>>>>>>> to 2000 seconds. Does this mean that if I don't receive
>>>>>>>>>>>>>> data for a
>>>>>>>>>>>>>> particular serviceitem (or service or host) for a 2000
>>>>>>>>>>>>>> seconds, the
>>>>>>>>>>>>>> data
>>>>>>>>>>>>>> related to it is lost?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, any plans for bundling this with distributions such
>>>>>>>>>>>>>> as Debian?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Monday 08 September 2014 02:04 PM, Anders Håål wrote:
>>>>>>>>>>>>>>> Hi Rahul,
>>>>>>>>>>>>>>> Thanks for the question and feedback on the
>>>>>>>>>>>>>>> documentation. Great to
>>>>>>>>>>>>>>> hear that you think Bischeck is awesome. If you do not
>>>>>>>>>>>>>>> understand how
>>>>>>>>>>>>>>> it works by reading the documentation you are probably not
>>>>>>>>>>>>>>> alone, and
>>>>>>>>>>>>>>> we should consider it a documentation bug.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In 1.0.0 we introduce the concept that you asking about
>>>>>>>>>>>>>>> and it
>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>> two different independent features.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Lets start with cache purging.
>>>>>>>>>>>>>>> Collected monitoring data, metrics, are kept in the
>>>>>>>>>>>>>>> cache (redis
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> 1.0.0) as a linked lists. There is one linked list per
>>>>>>>>>>>>>>> service
>>>>>>>>>>>>>>> definition, like host1-service1-serviceitem1. Prior to
>>>>>>>>>>>>>>> 1.0.0
>>>>>>>>>>>>>>> all the
>>>>>>>>>>>>>>> linked lists had the same size that was defined with the
>>>>>>>>>>>>>>> property
>>>>>>>>>>>>>>> lastStatusCacheSize. But in 1.0.0 we made that
>>>>>>>>>>>>>>> configurable so it
>>>>>>>>>>>>>>> could be defined per service definition.
>>>>>>>>>>>>>>> To enable individual cache configurations we added a
>>>>>>>>>>>>>>> section called
>>>>>>>>>>>>>>> <cache> in the serviceitem section of the bischeck.xml.
>>>>>>>>>>>>>>> Like many
>>>>>>>>>>>>>>> other configuration options in 1.0.0 the cache section
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>> have the
>>>>>>>>>>>>>>> specific values or point to a template that could be
>>>>>>>>>>>>>>> shared.
>>>>>>>>>>>>>>> To manage the size of the cache , or to be more specific
>>>>>>>>>>>>>>> the linked
>>>>>>>>>>>>>>> list size, we defined the <purge> section. The purge
>>>>>>>>>>>>>>> section can
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> two different configurations. The first is defining the
>>>>>>>>>>>>>>> max size of
>>>>>>>>>>>>>>> the cache linked list.
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>> <purge>
>>>>>>>>>>>>>>> <maxcount>1000</maxcount>
>>>>>>>>>>>>>>> </purge>
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The second options is to define the “time to live” for the
>>>>>>>>>>>>>>> metrics in
>>>>>>>>>>>>>>> the cache.
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>> <purge>
>>>>>>>>>>>>>>> <offset>10</offset>
>>>>>>>>>>>>>>> <period>D</period>
>>>>>>>>>>>>>>> </purge>
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>> In the above example we set the time to live to 10 days.
>>>>>>>>>>>>>>> So any
>>>>>>>>>>>>>>> metrics older then this period will be removed. The
>>>>>>>>>>>>>>> period can have
>>>>>>>>>>>>>>> the following values:
>>>>>>>>>>>>>>> H - hours
>>>>>>>>>>>>>>> D - days
>>>>>>>>>>>>>>> W - weeks
>>>>>>>>>>>>>>> Y - year
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The two option are mutual exclusive. You have to chose
>>>>>>>>>>>>>>> one for each
>>>>>>>>>>>>>>> serviceitem or cache template.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If no cache directive is define for a serviceitem the
>>>>>>>>>>>>>>> property
>>>>>>>>>>>>>>> lastStatusCacheSize will be used. It's default value is
>>>>>>>>>>>>>>> 500.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hopefully this explains the cache purging.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The next question was related to aggregations which has
>>>>>>>>>>>>>>> nothing
>>>>>>>>>>>>>>> to do
>>>>>>>>>>>>>>> with purging, but it's configured in the same <cache>
>>>>>>>>>>>>>>> section. The
>>>>>>>>>>>>>>> idea with aggregations was to create an automatic way to
>>>>>>>>>>>>>>> aggregate
>>>>>>>>>>>>>>> metrics on the level of an hour, day, week and month. The
>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>> functions current supported is average, max and min.
>>>>>>>>>>>>>>> Lets say you have a service definition of the format
>>>>>>>>>>>>>>> host1-service1-serviceitem1. When you enable an average
>>>>>>>>>>>>>>> (avg)
>>>>>>>>>>>>>>> aggregation you will automatically get the following new
>>>>>>>>>>>>>>> service
>>>>>>>>>>>>>>> definitions
>>>>>>>>>>>>>>> host1-service1/H/avg-serviceitem1
>>>>>>>>>>>>>>> host1-service1/D/avg-serviceitem1
>>>>>>>>>>>>>>> host1-service1/W/avg-serviceitem1
>>>>>>>>>>>>>>> host1-service1/M/avg-serviceitem1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The configuration you need to achive the above average
>>>>>>>>>>>>>>> aggregations is:
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you like to combine it with the above descibed
>>>>>>>>>>>>>>> purging your
>>>>>>>>>>>>>>> configuration would look like:
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <purge>
>>>>>>>>>>>>>>> <offset>10</offset>
>>>>>>>>>>>>>>> <period>D</period>
>>>>>>>>>>>>>>> </purge>
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The new aggregated service definitions,
>>>>>>>>>>>>>>> host1-service1/H/avg-serviceitem1, etc, will have their
>>>>>>>>>>>>>>> own cache
>>>>>>>>>>>>>>> entries and can be used in threshold configurations and
>>>>>>>>>>>>>>> virtual
>>>>>>>>>>>>>>> services like any other service definitions. For example
>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>> threshold hours section we could define
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <hours hoursID="2">
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <hourinterval>
>>>>>>>>>>>>>>> <from>09:00</from>
>>>>>>>>>>>>>>> <to>12:00</to>
>>>>>>>>>>>>>>> <threshold>host1-service1/H/avg-serviceitem1[0]*0.8</threshold>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> </hourinterval>
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This would mean that we use the average value for
>>>>>>>>>>>>>>> host1-service1-serviceitem1 for the period of the last
>>>>>>>>>>>>>>> hour.
>>>>>>>>>>>>>>> Aggregations are calculated hourly, daily, weekly and
>>>>>>>>>>>>>>> monthly.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> By default weekends metrics are not included in the
>>>>>>>>>>>>>>> aggrgation
>>>>>>>>>>>>>>> calculation. This can be enabled by setting the
>>>>>>>>>>>>>>> <useweekend>true</useweekend>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>> ….
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This will create aggregated service definitions with the
>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>> name standard:
>>>>>>>>>>>>>>> host1-service1/H/avg/weekend-serviceitem1
>>>>>>>>>>>>>>> host1-service1/D/avg/weekend-serviceitem1
>>>>>>>>>>>>>>> host1-service1/W/avg/weekend-serviceitem1
>>>>>>>>>>>>>>> host1-service1/M/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You can also have multiple entries like:
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>>>>> <method>max</method>
>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>> ….
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So how long time will the aggregated values be kept in the
>>>>>>>>>>>>>>> cache? By
>>>>>>>>>>>>>>> default we save
>>>>>>>>>>>>>>> Hour aggregation for 25 hours
>>>>>>>>>>>>>>> Daily aggregations for 7 days
>>>>>>>>>>>>>>> Weekly aggregations for 5 weeks
>>>>>>>>>>>>>>> Monthly aggregations for 1 month
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> These values can be override but they can not be lower
>>>>>>>>>>>>>>> then the
>>>>>>>>>>>>>>> default. Below you have an example where we save the
>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> 168 hours, 60 days and 53 weeks.
>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>> <retention>
>>>>>>>>>>>>>>> <period>H</period>
>>>>>>>>>>>>>>> <offset>168</offset>
>>>>>>>>>>>>>>> </retention>
>>>>>>>>>>>>>>> <retention>
>>>>>>>>>>>>>>> <period>D</period>
>>>>>>>>>>>>>>> <offset>60</offset>
>>>>>>>>>>>>>>> </retention>
>>>>>>>>>>>>>>> <retention>
>>>>>>>>>>>>>>> <period>W</period>
>>>>>>>>>>>>>>> <offset>53</offset>
>>>>>>>>>>>>>>> </retention>
>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>> ….
>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I hope this makes it a bit less confusing. What is clear
>>>>>>>>>>>>>>> to me is
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> we need to improve the documentation in this area.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 09/08/2014 06:02 AM, Rahul Amaram wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> I am trying to setup the bischeck plugin for our
>>>>>>>>>>>>>>>> organization. I
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> configured most part of it except for the cache
>>>>>>>>>>>>>>>> retention period.
>>>>>>>>>>>>>>>> Here
>>>>>>>>>>>>>>>> is what I want - I want to store every value which has
>>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>> generated
>>>>>>>>>>>>>>>> during the past 1 month. The reason being my threshold is
>>>>>>>>>>>>>>>> currently
>>>>>>>>>>>>>>>> calculated as the average of the metric value during
>>>>>>>>>>>>>>>> the past 4
>>>>>>>>>>>>>>>> weeks at
>>>>>>>>>>>>>>>> the same time of the day.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So, how do I define the cache template for this? If I
>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>> define any
>>>>>>>>>>>>>>>> cache template, for how many days is the data kept?
>>>>>>>>>>>>>>>> Also, how does the aggregrate function work and and
>>>>>>>>>>>>>>>> what does the
>>>>>>>>>>>>>>>> purge
>>>>>>>>>>>>>>>> Maxitems signify?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've gone through the documentation but it wasn't
>>>>>>>>>>>>>>>> clear. Looking
>>>>>>>>>>>>>>>> forward
>>>>>>>>>>>>>>>> to a response.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bischeck is one awesome plugin. Keep up the great work.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>
--
More information about the Bischeck-users
mailing list