Ang.: Specifying the retention period
anders.haal@ingby.com
anders.haal at ingby.com
Fri Sep 12 15:01:06 CEST 2014
Good input and good luck with your testing.
----- Reply message -----
Från: "Rahul Amaram" <rahul.amaram at vizury.com>
Till: <anders.haal at ingby.com>, <bischeck-users at monitoring-lists.org>
Rubrik: Specifying the retention period
Datum: fre, sep 12, 2014 13:12
Yup that's a useful tool. I think in the documentation you can have a
Troubleshooting section where you cover some of these tools separately
and some common scenarios on how to troubleshoot.
- Rahul.
On Friday 12 September 2014 02:41 PM, Anders Håål wrote:
> Glad that it worked out. What is clear to me is that this topic is not
> that simple to understand with the current documentation, so this
> feedback from you is vary valuable. Will add some additional blog
> posts on the topic and then get it into the next major release
> documentation. We will also need to figure out if this can be simplified.
>
> Did you try the CacheCli?
>
> Keep the feedback coming.
> Anders
>
> On 09/11/2014 11:39 PM, Rahul Amaram wrote:
>> Ok. I figured out the problem. It was with my understanding. I have
>> weekend to be true. So, instead of
>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23], I should
>> be using
>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg/weekend-$$SERVICEITEMNAME$$[23]
>> and so on.
>>
>> Thanks for the awesome support.
>>
>> - Rahul.
>>
>> On Thursday 11 September 2014 11:43 AM, Anders Håål wrote:
>>> Hi Rahul,
>>> Now I have a backlog of questions :)
>>> Okay lets start with the last question.
>>> - First verify that you have data in the cahe. User redis-cli or the
>>> Bischeck CacheCli,
>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4.
>>> - Then there is an issue with null data. Lets say that one of the
>>> expressions you have return null. Null is tricky so in Bischeck you
>>> have to decide how to manage a null value. Look at
>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Section-4.3.
>>>
>>> - You can also check the logs and also increase the loglevel to
>>> debug to get more info. Check out
>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-3.2.
>>>
>>>
>>> The two following questions I will try to clarify better later, must
>>> run into a meeting, but the index on hour specify an specific hour,
>>> like the avg, max or min for that hour. Index 0 means the last
>>> calculated hour so if time is 2:30 index 0 means the avg, max or min
>>> for the period 1:00 to 2:00.
>>>
>>> These are good question, we are glad that get your users perspective
>>> on this.
>>> Anders
>>>
>>> On 09/11/2014 07:19 AM, Rahul Amaram wrote:
>>>> This doesn't help :(.
>>>>
>>>> <threshold>avg($$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[167],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[335])</threshold>
>>>>
>>>>
>>>> - Rahul.
>>>>
>>>> On Thursday 11 September 2014 10:45 AM, Rahul Amaram wrote:
>>>>> Also, let us say, that the current time is 2.30 and that I want
>>>>> the average of all the values between 2.00 and 3.00 the previous
>>>>> day, I'd probably have to use
>>>>>
>>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23]
>>>>>
>>>>> rather than
>>>>>
>>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[24]
>>>>>
>>>>> Am I right ?
>>>>>
>>>>> Thanks,
>>>>> Rahul.
>>>>>
>>>>> On Thursday 11 September 2014 10:39 AM, Rahul Amaram wrote:
>>>>>> Ok. So would
>>>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[24] refer
>>>>>> to the average of the all the values ONLY in the 24th hour before
>>>>>> the current time?
>>>>>>
>>>>>> On Thursday 11 September 2014 10:30 AM, Anders Håål wrote:
>>>>>>> Hi Amaram,
>>>>>>> I think you just need to remove the minus sign when using the
>>>>>>> aggregated. Minus is used for time, like back in time, and just
>>>>>>> a integer without minus and a time indicator is an index. Check
>>>>>>> out
>>>>>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Chapter-4.
>>>>>>>
>>>>>>> You can also use redis-cli to explore the data in the cache. The
>>>>>>> key in the redis is the same as the service definition.
>>>>>>> Anders
>>>>>>>
>>>>>>> On 09/11/2014 06:38 AM, Rahul Amaram wrote:
>>>>>>>> Ok. I am facing another issue. I have been running bischeck
>>>>>>>> with the aggregate function for more than a day. I am using the
>>>>>>>> below threshold function.
>>>>>>>>
>>>>>>>> <threshold>avg($$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-24],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-168],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-336])</threshold>
>>>>>>>>
>>>>>>>>
>>>>>>>> and it doesn't seem to work. I am expecting that the first
>>>>>>>> aggregate value should be available.
>>>>>>>>
>>>>>>>> Instead if I use the below threshold function (I know this is
>>>>>>>> not related to aggregate)
>>>>>>>>
>>>>>>>> avg($$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-24H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-168H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-336H])
>>>>>>>>
>>>>>>>>
>>>>>>>> the threshold is calcuated fine, which is just the first value
>>>>>>>> as the remaining two values are not in cache.
>>>>>>>>
>>>>>>>> How can I debug why aggregate is not working?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Rahul.
>>>>>>>>
>>>>>>>> On Wednesday 10 September 2014 04:53 PM, Anders Håål wrote:
>>>>>>>>> Thanks - got the ticket.
>>>>>>>>> I will update progress on the bug ticket, but its good that
>>>>>>>>> the work around works.
>>>>>>>>> Anders
>>>>>>>>>
>>>>>>>>> On 09/10/2014 01:20 PM, Rahul Amaram wrote:
>>>>>>>>>> That indeed seems to be the problem. Using count rather than
>>>>>>>>>> period
>>>>>>>>>> seems to address the issue. Raised a ticket -
>>>>>>>>>> http://gforge.ingby.com/gf/project/bischeck/tracker/?action=TrackerItemEdit&tracker_item_id=259
>>>>>>>>>>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Rahul.
>>>>>>>>>>
>>>>>>>>>> On Wednesday 10 September 2014 04:02 PM, Anders Håål wrote:
>>>>>>>>>>> This looks like a bug. Could you please report it on
>>>>>>>>>>> http://gforge.ingby.com/gf/project/bischeck/tracker/ in the
>>>>>>>>>>> Bugs
>>>>>>>>>>> tracker. You need a account but its just a sign up and you
>>>>>>>>>>> get an
>>>>>>>>>>> email confirmation.
>>>>>>>>>>> Can you try to use maxcount for purging instead as a work
>>>>>>>>>>> around? Just
>>>>>>>>>>> calculate your maxcount based on the scheduling interval you
>>>>>>>>>>> use.
>>>>>>>>>>> Anders
>>>>>>>>>>>
>>>>>>>>>>> On 09/10/2014 12:17 PM, Rahul Amaram wrote:
>>>>>>>>>>>> Following up on the earlier topic, I am seeing the below
>>>>>>>>>>>> errors related
>>>>>>>>>>>> to cache purge. Any idea on what might be causing this? I
>>>>>>>>>>>> don't see any
>>>>>>>>>>>> other errors in log related to metrics.
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-09-10 12:12:00.001 ; INFO ;
>>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob ;
>>>>>>>>>>>> CachePurge
>>>>>>>>>>>> purging 180
>>>>>>>>>>>> 2014-09-10 12:12:00.003 ; INFO ;
>>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob ;
>>>>>>>>>>>> CachePurge
>>>>>>>>>>>> executed in 1 ms
>>>>>>>>>>>> 2014-09-10 12:12:00.003 ; ERROR ;
>>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>>>> org.quartz.core.JobRunShell ; Job
>>>>>>>>>>>> DailyMaintenance.CachePurge threw an
>>>>>>>>>>>> unhandled Exception: java.lang.NullPointerException: null
>>>>>>>>>>>> at
>>>>>>>>>>>> com.ingby.socbox.bischeck.cache.provider.redis.LastStatusCache.trim(LastStatusCache.java:1250)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> at
>>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob.execute(CachePurgeJob.java:140)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-09-10 12:12:00.003 ; ERROR ;
>>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>>>> org.quartz.core.ErrorLogger ; Job
>>>>>>>>>>>> (DailyMaintenance.CachePurge threw an
>>>>>>>>>>>> exception.org.quartz.SchedulerException: Job threw an
>>>>>>>>>>>> unhandled
>>>>>>>>>>>> exception.
>>>>>>>>>>>> at
>>>>>>>>>>>> org.quartz.core.JobRunShell.run(JobRunShell.java:224)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Caused by: java.lang.NullPointerException: null
>>>>>>>>>>>> at
>>>>>>>>>>>> com.ingby.socbox.bischeck.cache.provider.redis.LastStatusCache.trim(LastStatusCache.java:1250)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> at
>>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob.execute(CachePurgeJob.java:140)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Here is my cache configuration:
>>>>>>>>>>>>
>>>>>>>>>>>> <cache>
>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>> <retention>
>>>>>>>>>>>> <period>H</period>
>>>>>>>>>>>> <offset>720</offset>
>>>>>>>>>>>> </retention>
>>>>>>>>>>>> <retention>
>>>>>>>>>>>> <period>D</period>
>>>>>>>>>>>> <offset>30</offset>
>>>>>>>>>>>> </retention>
>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>
>>>>>>>>>>>> <purge>
>>>>>>>>>>>> <offset>30</offset>
>>>>>>>>>>>> <period>D</period>
>>>>>>>>>>>> </purge>
>>>>>>>>>>>> </cache>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Rahul.
>>>>>>>>>>>> On Monday 08 September 2014 08:39 PM, Anders Håål wrote:
>>>>>>>>>>>>> Great if you can make a debian package, and I understand
>>>>>>>>>>>>> that you can
>>>>>>>>>>>>> not commit. The best thing would be integrated to our
>>>>>>>>>>>>> build process
>>>>>>>>>>>>> where we use ant.
>>>>>>>>>>>>>
>>>>>>>>>>>>> if the purging is based on time then it could happen that
>>>>>>>>>>>>> data is
>>>>>>>>>>>>> removed from the cache since the logic is based on time
>>>>>>>>>>>>> relative to
>>>>>>>>>>>>> now. To avoid it you should increase the purge time before
>>>>>>>>>>>>> you start
>>>>>>>>>>>>> bischeck. And just a comment on your last sentence Redis
>>>>>>>>>>>>> TTl is never
>>>>>>>>>>>>> used :)
>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 09/08/2014 02:09 PM, Rahul Amaram wrote:
>>>>>>>>>>>>>> I would be more than happy to give you guys a
>>>>>>>>>>>>>> testimonial. However, we
>>>>>>>>>>>>>> have just taken this live and would like to see its
>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>> before I
>>>>>>>>>>>>>> give a testimonial.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, if time permits, I'll try to bundle this for Debian
>>>>>>>>>>>>>> (I'm a
>>>>>>>>>>>>>> Debian
>>>>>>>>>>>>>> maintainer). I can't commit on a timeline right away
>>>>>>>>>>>>>> though :).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, just to make things explicitly clear. I understand
>>>>>>>>>>>>>> that the
>>>>>>>>>>>>>> below
>>>>>>>>>>>>>> service item ttl has nothing to do with Redis TTL. But If
>>>>>>>>>>>>>> I stop my
>>>>>>>>>>>>>> bischeck server for a day or two, then would any of my
>>>>>>>>>>>>>> metrics get
>>>>>>>>>>>>>> lost?
>>>>>>>>>>>>>> Or would I have to increase th Redis TTL for this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Monday 08 September 2014 04:09 PM, Anders Håål wrote:
>>>>>>>>>>>>>>> Glad that it clarified how to configure the cache
>>>>>>>>>>>>>>> section. I will
>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>> a blog post on this in the mean time, until we have a
>>>>>>>>>>>>>>> updated
>>>>>>>>>>>>>>> documentation. I agree with you that the structure of the
>>>>>>>>>>>>>>> configuration is a bit "heavy", so ideas and input is
>>>>>>>>>>>>>>> appreciated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regarding redis ttl, this is a redis feature we do not
>>>>>>>>>>>>>>> use. The ttl
>>>>>>>>>>>>>>> mentioned in my mail is managed by bischeck. Redis ttl
>>>>>>>>>>>>>>> on linked list
>>>>>>>>>>>>>>> do not work on individual nodes in a redis linked list.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Currently the bischeck installer should work for ubuntu,
>>>>>>>>>>>>>>> redhat/centos
>>>>>>>>>>>>>>> and debian. There is currently no plans to make
>>>>>>>>>>>>>>> distribution packages
>>>>>>>>>>>>>>> like rpm or deb. I know op5 (www.op5.com) that bundles
>>>>>>>>>>>>>>> Bischeck
>>>>>>>>>>>>>>> make a
>>>>>>>>>>>>>>> bischeck rpm. It would be super if there is any one that
>>>>>>>>>>>>>>> like to do
>>>>>>>>>>>>>>> this for the project.
>>>>>>>>>>>>>>> When it comes to packaging we have done a bit of work to
>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>> docker
>>>>>>>>>>>>>>> containers, but its still experimental.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I also encourage you, if you think bischeck support your
>>>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>>> effort, to write a small testimony that we can put on
>>>>>>>>>>>>>>> the site.
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 09/08/2014 11:30 AM, Rahul Amaram wrote:
>>>>>>>>>>>>>>>> Thanks Anders. This explains precisely why my data was
>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>> purged
>>>>>>>>>>>>>>>> after 16 hours (30 values per hour * 1 hours = 480). It
>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>> great
>>>>>>>>>>>>>>>> if you could update the documentation with this info.
>>>>>>>>>>>>>>>> The entire
>>>>>>>>>>>>>>>> setup
>>>>>>>>>>>>>>>> and configuration itself takes time to get a hold on
>>>>>>>>>>>>>>>> and detailed
>>>>>>>>>>>>>>>> documentation would be very helpful.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, another quick question? Right now, I believe the
>>>>>>>>>>>>>>>> Redis TTL is
>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>> to 2000 seconds. Does this mean that if I don't receive
>>>>>>>>>>>>>>>> data for a
>>>>>>>>>>>>>>>> particular serviceitem (or service or host) for a 2000
>>>>>>>>>>>>>>>> seconds, the
>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>> related to it is lost?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, any plans for bundling this with distributions
>>>>>>>>>>>>>>>> such as Debian?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Monday 08 September 2014 02:04 PM, Anders Håål wrote:
>>>>>>>>>>>>>>>>> Hi Rahul,
>>>>>>>>>>>>>>>>> Thanks for the question and feedback on the
>>>>>>>>>>>>>>>>> documentation. Great to
>>>>>>>>>>>>>>>>> hear that you think Bischeck is awesome. If you do not
>>>>>>>>>>>>>>>>> understand how
>>>>>>>>>>>>>>>>> it works by reading the documentation you are probably
>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>> alone, and
>>>>>>>>>>>>>>>>> we should consider it a documentation bug.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In 1.0.0 we introduce the concept that you asking
>>>>>>>>>>>>>>>>> about and it
>>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>> two different independent features.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Lets start with cache purging.
>>>>>>>>>>>>>>>>> Collected monitoring data, metrics, are kept in the
>>>>>>>>>>>>>>>>> cache (redis
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> 1.0.0) as a linked lists. There is one linked list per
>>>>>>>>>>>>>>>>> service
>>>>>>>>>>>>>>>>> definition, like host1-service1-serviceitem1. Prior to
>>>>>>>>>>>>>>>>> 1.0.0
>>>>>>>>>>>>>>>>> all the
>>>>>>>>>>>>>>>>> linked lists had the same size that was defined with
>>>>>>>>>>>>>>>>> the property
>>>>>>>>>>>>>>>>> lastStatusCacheSize. But in 1.0.0 we made that
>>>>>>>>>>>>>>>>> configurable so it
>>>>>>>>>>>>>>>>> could be defined per service definition.
>>>>>>>>>>>>>>>>> To enable individual cache configurations we added a
>>>>>>>>>>>>>>>>> section called
>>>>>>>>>>>>>>>>> <cache> in the serviceitem section of the
>>>>>>>>>>>>>>>>> bischeck.xml. Like many
>>>>>>>>>>>>>>>>> other configuration options in 1.0.0 the cache section
>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>> have the
>>>>>>>>>>>>>>>>> specific values or point to a template that could be
>>>>>>>>>>>>>>>>> shared.
>>>>>>>>>>>>>>>>> To manage the size of the cache , or to be more
>>>>>>>>>>>>>>>>> specific the linked
>>>>>>>>>>>>>>>>> list size, we defined the <purge> section. The purge
>>>>>>>>>>>>>>>>> section can
>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>> two different configurations. The first is defining
>>>>>>>>>>>>>>>>> the max size of
>>>>>>>>>>>>>>>>> the cache linked list.
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>> <purge>
>>>>>>>>>>>>>>>>> <maxcount>1000</maxcount>
>>>>>>>>>>>>>>>>> </purge>
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The second options is to define the “time to live” for
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> metrics in
>>>>>>>>>>>>>>>>> the cache.
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>> <purge>
>>>>>>>>>>>>>>>>> <offset>10</offset>
>>>>>>>>>>>>>>>>> <period>D</period>
>>>>>>>>>>>>>>>>> </purge>
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>> In the above example we set the time to live to 10
>>>>>>>>>>>>>>>>> days. So any
>>>>>>>>>>>>>>>>> metrics older then this period will be removed. The
>>>>>>>>>>>>>>>>> period can have
>>>>>>>>>>>>>>>>> the following values:
>>>>>>>>>>>>>>>>> H - hours
>>>>>>>>>>>>>>>>> D - days
>>>>>>>>>>>>>>>>> W - weeks
>>>>>>>>>>>>>>>>> Y - year
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The two option are mutual exclusive. You have to chose
>>>>>>>>>>>>>>>>> one for each
>>>>>>>>>>>>>>>>> serviceitem or cache template.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If no cache directive is define for a serviceitem the
>>>>>>>>>>>>>>>>> property
>>>>>>>>>>>>>>>>> lastStatusCacheSize will be used. It's default value
>>>>>>>>>>>>>>>>> is 500.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hopefully this explains the cache purging.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The next question was related to aggregations which
>>>>>>>>>>>>>>>>> has nothing
>>>>>>>>>>>>>>>>> to do
>>>>>>>>>>>>>>>>> with purging, but it's configured in the same <cache>
>>>>>>>>>>>>>>>>> section. The
>>>>>>>>>>>>>>>>> idea with aggregations was to create an automatic way
>>>>>>>>>>>>>>>>> to aggregate
>>>>>>>>>>>>>>>>> metrics on the level of an hour, day, week and month. The
>>>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>>>> functions current supported is average, max and min.
>>>>>>>>>>>>>>>>> Lets say you have a service definition of the format
>>>>>>>>>>>>>>>>> host1-service1-serviceitem1. When you enable an
>>>>>>>>>>>>>>>>> average (avg)
>>>>>>>>>>>>>>>>> aggregation you will automatically get the following
>>>>>>>>>>>>>>>>> new service
>>>>>>>>>>>>>>>>> definitions
>>>>>>>>>>>>>>>>> host1-service1/H/avg-serviceitem1
>>>>>>>>>>>>>>>>> host1-service1/D/avg-serviceitem1
>>>>>>>>>>>>>>>>> host1-service1/W/avg-serviceitem1
>>>>>>>>>>>>>>>>> host1-service1/M/avg-serviceitem1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The configuration you need to achive the above average
>>>>>>>>>>>>>>>>> aggregations is:
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If you like to combine it with the above descibed
>>>>>>>>>>>>>>>>> purging your
>>>>>>>>>>>>>>>>> configuration would look like:
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <purge>
>>>>>>>>>>>>>>>>> <offset>10</offset>
>>>>>>>>>>>>>>>>> <period>D</period>
>>>>>>>>>>>>>>>>> </purge>
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The new aggregated service definitions,
>>>>>>>>>>>>>>>>> host1-service1/H/avg-serviceitem1, etc, will have
>>>>>>>>>>>>>>>>> their own cache
>>>>>>>>>>>>>>>>> entries and can be used in threshold configurations
>>>>>>>>>>>>>>>>> and virtual
>>>>>>>>>>>>>>>>> services like any other service definitions. For
>>>>>>>>>>>>>>>>> example in a
>>>>>>>>>>>>>>>>> threshold hours section we could define
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <hours hoursID="2">
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <hourinterval>
>>>>>>>>>>>>>>>>> <from>09:00</from>
>>>>>>>>>>>>>>>>> <to>12:00</to>
>>>>>>>>>>>>>>>>> <threshold>host1-service1/H/avg-serviceitem1[0]*0.8</threshold>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> </hourinterval>
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This would mean that we use the average value for
>>>>>>>>>>>>>>>>> host1-service1-serviceitem1 for the period of the
>>>>>>>>>>>>>>>>> last hour.
>>>>>>>>>>>>>>>>> Aggregations are calculated hourly, daily, weekly and
>>>>>>>>>>>>>>>>> monthly.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> By default weekends metrics are not included in the
>>>>>>>>>>>>>>>>> aggrgation
>>>>>>>>>>>>>>>>> calculation. This can be enabled by setting the
>>>>>>>>>>>>>>>>> <useweekend>true</useweekend>:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>>>> ….
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This will create aggregated service definitions with
>>>>>>>>>>>>>>>>> the following
>>>>>>>>>>>>>>>>> name standard:
>>>>>>>>>>>>>>>>> host1-service1/H/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>>> host1-service1/D/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>>> host1-service1/W/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>>> host1-service1/M/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You can also have multiple entries like:
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>>>>>>> <method>max</method>
>>>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>>>> ….
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So how long time will the aggregated values be kept in
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> cache? By
>>>>>>>>>>>>>>>>> default we save
>>>>>>>>>>>>>>>>> Hour aggregation for 25 hours
>>>>>>>>>>>>>>>>> Daily aggregations for 7 days
>>>>>>>>>>>>>>>>> Weekly aggregations for 5 weeks
>>>>>>>>>>>>>>>>> Monthly aggregations for 1 month
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> These values can be override but they can not be lower
>>>>>>>>>>>>>>>>> then the
>>>>>>>>>>>>>>>>> default. Below you have an example where we save the
>>>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> 168 hours, 60 days and 53 weeks.
>>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>> <aggregate>
>>>>>>>>>>>>>>>>> <method>avg</method>
>>>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>>>> <retention>
>>>>>>>>>>>>>>>>> <period>H</period>
>>>>>>>>>>>>>>>>> <offset>168</offset>
>>>>>>>>>>>>>>>>> </retention>
>>>>>>>>>>>>>>>>> <retention>
>>>>>>>>>>>>>>>>> <period>D</period>
>>>>>>>>>>>>>>>>> <offset>60</offset>
>>>>>>>>>>>>>>>>> </retention>
>>>>>>>>>>>>>>>>> <retention>
>>>>>>>>>>>>>>>>> <period>W</period>
>>>>>>>>>>>>>>>>> <offset>53</offset>
>>>>>>>>>>>>>>>>> </retention>
>>>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>>>> ….
>>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I hope this makes it a bit less confusing. What is
>>>>>>>>>>>>>>>>> clear to me is
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> we need to improve the documentation in this area.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 09/08/2014 06:02 AM, Rahul Amaram wrote:
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>> I am trying to setup the bischeck plugin for our
>>>>>>>>>>>>>>>>>> organization. I
>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>> configured most part of it except for the cache
>>>>>>>>>>>>>>>>>> retention period.
>>>>>>>>>>>>>>>>>> Here
>>>>>>>>>>>>>>>>>> is what I want - I want to store every value which
>>>>>>>>>>>>>>>>>> has been
>>>>>>>>>>>>>>>>>> generated
>>>>>>>>>>>>>>>>>> during the past 1 month. The reason being my
>>>>>>>>>>>>>>>>>> threshold is
>>>>>>>>>>>>>>>>>> currently
>>>>>>>>>>>>>>>>>> calculated as the average of the metric value during
>>>>>>>>>>>>>>>>>> the past 4
>>>>>>>>>>>>>>>>>> weeks at
>>>>>>>>>>>>>>>>>> the same time of the day.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So, how do I define the cache template for this? If I
>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>> define any
>>>>>>>>>>>>>>>>>> cache template, for how many days is the data kept?
>>>>>>>>>>>>>>>>>> Also, how does the aggregrate function work and and
>>>>>>>>>>>>>>>>>> what does the
>>>>>>>>>>>>>>>>>> purge
>>>>>>>>>>>>>>>>>> Maxitems signify?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've gone through the documentation but it wasn't
>>>>>>>>>>>>>>>>>> clear. Looking
>>>>>>>>>>>>>>>>>> forward
>>>>>>>>>>>>>>>>>> to a response.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Bischeck is one awesome plugin. Keep up the great work.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20140912/a8b5d8b0/attachment.html>
More information about the Bischeck-users
mailing list