<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html><head></head><body><div style="font-size: 12pt; font-family: Calibri,sans-serif;"><div>Good input and good luck with your testing.</div><br><div id="htc_header">----- Reply message -----<br>Från: "Rahul Amaram" <rahul.amaram@vizury.com><br>Till: <anders.haal@ingby.com>, <bischeck-users@monitoring-lists.org><br>Rubrik: Specifying the retention period<br>Datum: fre, sep 12, 2014 13:12</div></div><br><pre style="word-wrap: break-word; white-space: pre-wrap;">Yup that's a useful tool. I think in the documentation you can have a Troubleshooting section where you cover some of these tools separately and some common scenarios on how to troubleshoot. - Rahul. On Friday 12 September 2014 02:41 PM, Anders Håål wrote: > Glad that it worked out. What is clear to me is that this topic is not > that simple to understand with the current documentation, so this > feedback from you is vary valuable. Will add some additional blog > posts on the topic and then get it into the next major release > documentation. We will also need to figure out if this can be simplified. > > Did you try the CacheCli? > > Keep the feedback coming. > Anders > > On 09/11/2014 11:39 PM, Rahul Amaram wrote: >> Ok. I figured out the problem. It was with my understanding. I have >> weekend to be true. So, instead of >> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23], I should >> be using >> $$HOSTNAME$$-$$SERVICENAME$$/H/avg/weekend-$$SERVICEITEMNAME$$[23] >> and so on. >> >> Thanks for the awesome support. >> >> - Rahul. >> >> On Thursday 11 September 2014 11:43 AM, Anders Håål wrote: >>> Hi Rahul, >>> Now I have a backlog of questions :) >>> Okay lets start with the last question. >>> - First verify that you have data in the cahe. User redis-cli or the >>> Bischeck CacheCli, >>> <a href="http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4.">http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4.</a> >>> - Then there is an issue with null data. Lets say that one of the >>> expressions you have return null. Null is tricky so in Bischeck you >>> have to decide how to manage a null value. Look at >>> <a href="http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Section-4.3.">http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Section-4.3.</a> >>> >>> - You can also check the logs and also increase the loglevel to >>> debug to get more info. Check out >>> <a href="http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-3.2.">http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-3.2.</a> >>> >>> >>> The two following questions I will try to clarify better later, must >>> run into a meeting, but the index on hour specify an specific hour, >>> like the avg, max or min for that hour. Index 0 means the last >>> calculated hour so if time is 2:30 index 0 means the avg, max or min >>> for the period 1:00 to 2:00. >>> >>> These are good question, we are glad that get your users perspective >>> on this. >>> Anders >>> >>> On 09/11/2014 07:19 AM, Rahul Amaram wrote: >>>> This doesn't help :(. >>>> >>>> <threshold>avg($$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[167],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[335])</threshold> >>>> >>>> >>>> - Rahul. >>>> >>>> On Thursday 11 September 2014 10:45 AM, Rahul Amaram wrote: >>>>> Also, let us say, that the current time is 2.30 and that I want >>>>> the average of all the values between 2.00 and 3.00 the previous >>>>> day, I'd probably have to use >>>>> >>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23] >>>>> >>>>> rather than >>>>> >>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[24] >>>>> >>>>> Am I right ? >>>>> >>>>> Thanks, >>>>> Rahul. >>>>> >>>>> On Thursday 11 September 2014 10:39 AM, Rahul Amaram wrote: >>>>>> Ok. So would >>>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[24] refer >>>>>> to the average of the all the values ONLY in the 24th hour before >>>>>> the current time? >>>>>> >>>>>> On Thursday 11 September 2014 10:30 AM, Anders Håål wrote: >>>>>>> Hi Amaram, >>>>>>> I think you just need to remove the minus sign when using the >>>>>>> aggregated. Minus is used for time, like back in time, and just >>>>>>> a integer without minus and a time indicator is an index. Check >>>>>>> out >>>>>>> <a href="http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Chapter-4.">http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Chapter-4.</a> >>>>>>> >>>>>>> You can also use redis-cli to explore the data in the cache. The >>>>>>> key in the redis is the same as the service definition. >>>>>>> Anders >>>>>>> >>>>>>> On 09/11/2014 06:38 AM, Rahul Amaram wrote: >>>>>>>> Ok. I am facing another issue. I have been running bischeck >>>>>>>> with the aggregate function for more than a day. I am using the >>>>>>>> below threshold function. >>>>>>>> >>>>>>>> <threshold>avg($$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-24],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-168],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-336])</threshold> >>>>>>>> >>>>>>>> >>>>>>>> and it doesn't seem to work. I am expecting that the first >>>>>>>> aggregate value should be available. >>>>>>>> >>>>>>>> Instead if I use the below threshold function (I know this is >>>>>>>> not related to aggregate) >>>>>>>> >>>>>>>> avg($$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-24H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-168H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-336H]) >>>>>>>> >>>>>>>> >>>>>>>> the threshold is calcuated fine, which is just the first value >>>>>>>> as the remaining two values are not in cache. >>>>>>>> >>>>>>>> How can I debug why aggregate is not working? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Rahul. >>>>>>>> >>>>>>>> On Wednesday 10 September 2014 04:53 PM, Anders Håål wrote: >>>>>>>>> Thanks - got the ticket. >>>>>>>>> I will update progress on the bug ticket, but its good that >>>>>>>>> the work around works. >>>>>>>>> Anders >>>>>>>>> >>>>>>>>> On 09/10/2014 01:20 PM, Rahul Amaram wrote: >>>>>>>>>> That indeed seems to be the problem. Using count rather than >>>>>>>>>> period >>>>>>>>>> seems to address the issue. Raised a ticket - >>>>>>>>>> <a href="http://gforge.ingby.com/gf/project/bischeck/tracker/?action=TrackerItemEdit&tracker_item_id=259">http://gforge.ingby.com/gf/project/bischeck/tracker/?action=TrackerItemEdit&tracker_item_id=259</a> >>>>>>>>>> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Rahul. >>>>>>>>>> >>>>>>>>>> On Wednesday 10 September 2014 04:02 PM, Anders Håål wrote: >>>>>>>>>>> This looks like a bug. Could you please report it on >>>>>>>>>>> <a href="http://gforge.ingby.com/gf/project/bischeck/tracker/">http://gforge.ingby.com/gf/project/bischeck/tracker/</a> in the >>>>>>>>>>> Bugs >>>>>>>>>>> tracker. You need a account but its just a sign up and you >>>>>>>>>>> get an >>>>>>>>>>> email confirmation. >>>>>>>>>>> Can you try to use maxcount for purging instead as a work >>>>>>>>>>> around? Just >>>>>>>>>>> calculate your maxcount based on the scheduling interval you >>>>>>>>>>> use. >>>>>>>>>>> Anders >>>>>>>>>>> >>>>>>>>>>> On 09/10/2014 12:17 PM, Rahul Amaram wrote: >>>>>>>>>>>> Following up on the earlier topic, I am seeing the below >>>>>>>>>>>> errors related >>>>>>>>>>>> to cache purge. Any idea on what might be causing this? I >>>>>>>>>>>> don't see any >>>>>>>>>>>> other errors in log related to metrics. >>>>>>>>>>>> >>>>>>>>>>>> 2014-09-10 12:12:00.001 ; INFO ; >>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ; >>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob ; >>>>>>>>>>>> CachePurge >>>>>>>>>>>> purging 180 >>>>>>>>>>>> 2014-09-10 12:12:00.003 ; INFO ; >>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ; >>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob ; >>>>>>>>>>>> CachePurge >>>>>>>>>>>> executed in 1 ms >>>>>>>>>>>> 2014-09-10 12:12:00.003 ; ERROR ; >>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ; >>>>>>>>>>>> org.quartz.core.JobRunShell ; Job >>>>>>>>>>>> DailyMaintenance.CachePurge threw an >>>>>>>>>>>> unhandled Exception: java.lang.NullPointerException: null >>>>>>>>>>>> at >>>>>>>>>>>> com.ingby.socbox.bischeck.cache.provider.redis.LastStatusCache.trim(LastStatusCache.java:1250) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> at >>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob.execute(CachePurgeJob.java:140) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2014-09-10 12:12:00.003 ; ERROR ; >>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ; >>>>>>>>>>>> org.quartz.core.ErrorLogger ; Job >>>>>>>>>>>> (DailyMaintenance.CachePurge threw an >>>>>>>>>>>> exception.org.quartz.SchedulerException: Job threw an >>>>>>>>>>>> unhandled >>>>>>>>>>>> exception. >>>>>>>>>>>> at >>>>>>>>>>>> org.quartz.core.JobRunShell.run(JobRunShell.java:224) >>>>>>>>>>>> at >>>>>>>>>>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Caused by: java.lang.NullPointerException: null >>>>>>>>>>>> at >>>>>>>>>>>> com.ingby.socbox.bischeck.cache.provider.redis.LastStatusCache.trim(LastStatusCache.java:1250) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> at >>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob.execute(CachePurgeJob.java:140) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Here is my cache configuration: >>>>>>>>>>>> >>>>>>>>>>>> <cache> >>>>>>>>>>>> <aggregate> >>>>>>>>>>>> <method>avg</method> >>>>>>>>>>>> <useweekend>true</useweekend> >>>>>>>>>>>> <retention> >>>>>>>>>>>> <period>H</period> >>>>>>>>>>>> <offset>720</offset> >>>>>>>>>>>> </retention> >>>>>>>>>>>> <retention> >>>>>>>>>>>> <period>D</period> >>>>>>>>>>>> <offset>30</offset> >>>>>>>>>>>> </retention> >>>>>>>>>>>> </aggregate> >>>>>>>>>>>> >>>>>>>>>>>> <purge> >>>>>>>>>>>> <offset>30</offset> >>>>>>>>>>>> <period>D</period> >>>>>>>>>>>> </purge> >>>>>>>>>>>> </cache> >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Rahul. >>>>>>>>>>>> On Monday 08 September 2014 08:39 PM, Anders Håål wrote: >>>>>>>>>>>>> Great if you can make a debian package, and I understand >>>>>>>>>>>>> that you can >>>>>>>>>>>>> not commit. The best thing would be integrated to our >>>>>>>>>>>>> build process >>>>>>>>>>>>> where we use ant. >>>>>>>>>>>>> >>>>>>>>>>>>> if the purging is based on time then it could happen that >>>>>>>>>>>>> data is >>>>>>>>>>>>> removed from the cache since the logic is based on time >>>>>>>>>>>>> relative to >>>>>>>>>>>>> now. To avoid it you should increase the purge time before >>>>>>>>>>>>> you start >>>>>>>>>>>>> bischeck. And just a comment on your last sentence Redis >>>>>>>>>>>>> TTl is never >>>>>>>>>>>>> used :) >>>>>>>>>>>>> Anders >>>>>>>>>>>>> >>>>>>>>>>>>> On 09/08/2014 02:09 PM, Rahul Amaram wrote: >>>>>>>>>>>>>> I would be more than happy to give you guys a >>>>>>>>>>>>>> testimonial. However, we >>>>>>>>>>>>>> have just taken this live and would like to see its >>>>>>>>>>>>>> performance >>>>>>>>>>>>>> before I >>>>>>>>>>>>>> give a testimonial. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also, if time permits, I'll try to bundle this for Debian >>>>>>>>>>>>>> (I'm a >>>>>>>>>>>>>> Debian >>>>>>>>>>>>>> maintainer). I can't commit on a timeline right away >>>>>>>>>>>>>> though :). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also, just to make things explicitly clear. I understand >>>>>>>>>>>>>> that the >>>>>>>>>>>>>> below >>>>>>>>>>>>>> service item ttl has nothing to do with Redis TTL. But If >>>>>>>>>>>>>> I stop my >>>>>>>>>>>>>> bischeck server for a day or two, then would any of my >>>>>>>>>>>>>> metrics get >>>>>>>>>>>>>> lost? >>>>>>>>>>>>>> Or would I have to increase th Redis TTL for this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Rahul. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Monday 08 September 2014 04:09 PM, Anders Håål wrote: >>>>>>>>>>>>>>> Glad that it clarified how to configure the cache >>>>>>>>>>>>>>> section. I will >>>>>>>>>>>>>>> make >>>>>>>>>>>>>>> a blog post on this in the mean time, until we have a >>>>>>>>>>>>>>> updated >>>>>>>>>>>>>>> documentation. I agree with you that the structure of the >>>>>>>>>>>>>>> configuration is a bit "heavy", so ideas and input is >>>>>>>>>>>>>>> appreciated. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regarding redis ttl, this is a redis feature we do not >>>>>>>>>>>>>>> use. The ttl >>>>>>>>>>>>>>> mentioned in my mail is managed by bischeck. Redis ttl >>>>>>>>>>>>>>> on linked list >>>>>>>>>>>>>>> do not work on individual nodes in a redis linked list. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Currently the bischeck installer should work for ubuntu, >>>>>>>>>>>>>>> redhat/centos >>>>>>>>>>>>>>> and debian. There is currently no plans to make >>>>>>>>>>>>>>> distribution packages >>>>>>>>>>>>>>> like rpm or deb. I know op5 (<a href="http://www.op5.com">www.op5.com</a>) that bundles >>>>>>>>>>>>>>> Bischeck >>>>>>>>>>>>>>> make a >>>>>>>>>>>>>>> bischeck rpm. It would be super if there is any one that >>>>>>>>>>>>>>> like to do >>>>>>>>>>>>>>> this for the project. >>>>>>>>>>>>>>> When it comes to packaging we have done a bit of work to >>>>>>>>>>>>>>> create >>>>>>>>>>>>>>> docker >>>>>>>>>>>>>>> containers, but its still experimental. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I also encourage you, if you think bischeck support your >>>>>>>>>>>>>>> monitoring >>>>>>>>>>>>>>> effort, to write a small testimony that we can put on >>>>>>>>>>>>>>> the site. >>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>> Anders >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 09/08/2014 11:30 AM, Rahul Amaram wrote: >>>>>>>>>>>>>>>> Thanks Anders. This explains precisely why my data was >>>>>>>>>>>>>>>> getting >>>>>>>>>>>>>>>> purged >>>>>>>>>>>>>>>> after 16 hours (30 values per hour * 1 hours = 480). It >>>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>>> great >>>>>>>>>>>>>>>> if you could update the documentation with this info. >>>>>>>>>>>>>>>> The entire >>>>>>>>>>>>>>>> setup >>>>>>>>>>>>>>>> and configuration itself takes time to get a hold on >>>>>>>>>>>>>>>> and detailed >>>>>>>>>>>>>>>> documentation would be very helpful. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, another quick question? Right now, I believe the >>>>>>>>>>>>>>>> Redis TTL is >>>>>>>>>>>>>>>> set >>>>>>>>>>>>>>>> to 2000 seconds. Does this mean that if I don't receive >>>>>>>>>>>>>>>> data for a >>>>>>>>>>>>>>>> particular serviceitem (or service or host) for a 2000 >>>>>>>>>>>>>>>> seconds, the >>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>> related to it is lost? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, any plans for bundling this with distributions >>>>>>>>>>>>>>>> such as Debian? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Rahul. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Monday 08 September 2014 02:04 PM, Anders Håål wrote: >>>>>>>>>>>>>>>>> Hi Rahul, >>>>>>>>>>>>>>>>> Thanks for the question and feedback on the >>>>>>>>>>>>>>>>> documentation. Great to >>>>>>>>>>>>>>>>> hear that you think Bischeck is awesome. If you do not >>>>>>>>>>>>>>>>> understand how >>>>>>>>>>>>>>>>> it works by reading the documentation you are probably >>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>> alone, and >>>>>>>>>>>>>>>>> we should consider it a documentation bug. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In 1.0.0 we introduce the concept that you asking >>>>>>>>>>>>>>>>> about and it >>>>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>>> two different independent features. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Lets start with cache purging. >>>>>>>>>>>>>>>>> Collected monitoring data, metrics, are kept in the >>>>>>>>>>>>>>>>> cache (redis >>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>> 1.0.0) as a linked lists. There is one linked list per >>>>>>>>>>>>>>>>> service >>>>>>>>>>>>>>>>> definition, like host1-service1-serviceitem1. Prior to >>>>>>>>>>>>>>>>> 1.0.0 >>>>>>>>>>>>>>>>> all the >>>>>>>>>>>>>>>>> linked lists had the same size that was defined with >>>>>>>>>>>>>>>>> the property >>>>>>>>>>>>>>>>> lastStatusCacheSize. But in 1.0.0 we made that >>>>>>>>>>>>>>>>> configurable so it >>>>>>>>>>>>>>>>> could be defined per service definition. >>>>>>>>>>>>>>>>> To enable individual cache configurations we added a >>>>>>>>>>>>>>>>> section called >>>>>>>>>>>>>>>>> <cache> in the serviceitem section of the >>>>>>>>>>>>>>>>> bischeck.xml. Like many >>>>>>>>>>>>>>>>> other configuration options in 1.0.0 the cache section >>>>>>>>>>>>>>>>> could >>>>>>>>>>>>>>>>> have the >>>>>>>>>>>>>>>>> specific values or point to a template that could be >>>>>>>>>>>>>>>>> shared. >>>>>>>>>>>>>>>>> To manage the size of the cache , or to be more >>>>>>>>>>>>>>>>> specific the linked >>>>>>>>>>>>>>>>> list size, we defined the <purge> section. The purge >>>>>>>>>>>>>>>>> section can >>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>> two different configurations. The first is defining >>>>>>>>>>>>>>>>> the max size of >>>>>>>>>>>>>>>>> the cache linked list. >>>>>>>>>>>>>>>>> <cache> >>>>>>>>>>>>>>>>> <purge> >>>>>>>>>>>>>>>>> <maxcount>1000</maxcount> >>>>>>>>>>>>>>>>> </purge> >>>>>>>>>>>>>>>>> </cache> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The second options is to define the “time to live” for >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> metrics in >>>>>>>>>>>>>>>>> the cache. >>>>>>>>>>>>>>>>> <cache> >>>>>>>>>>>>>>>>> <purge> >>>>>>>>>>>>>>>>> <offset>10</offset> >>>>>>>>>>>>>>>>> <period>D</period> >>>>>>>>>>>>>>>>> </purge> >>>>>>>>>>>>>>>>> </cache> >>>>>>>>>>>>>>>>> In the above example we set the time to live to 10 >>>>>>>>>>>>>>>>> days. So any >>>>>>>>>>>>>>>>> metrics older then this period will be removed. The >>>>>>>>>>>>>>>>> period can have >>>>>>>>>>>>>>>>> the following values: >>>>>>>>>>>>>>>>> H - hours >>>>>>>>>>>>>>>>> D - days >>>>>>>>>>>>>>>>> W - weeks >>>>>>>>>>>>>>>>> Y - year >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The two option are mutual exclusive. You have to chose >>>>>>>>>>>>>>>>> one for each >>>>>>>>>>>>>>>>> serviceitem or cache template. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If no cache directive is define for a serviceitem the >>>>>>>>>>>>>>>>> property >>>>>>>>>>>>>>>>> lastStatusCacheSize will be used. It's default value >>>>>>>>>>>>>>>>> is 500. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hopefully this explains the cache purging. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The next question was related to aggregations which >>>>>>>>>>>>>>>>> has nothing >>>>>>>>>>>>>>>>> to do >>>>>>>>>>>>>>>>> with purging, but it's configured in the same <cache> >>>>>>>>>>>>>>>>> section. The >>>>>>>>>>>>>>>>> idea with aggregations was to create an automatic way >>>>>>>>>>>>>>>>> to aggregate >>>>>>>>>>>>>>>>> metrics on the level of an hour, day, week and month. The >>>>>>>>>>>>>>>>> aggregation >>>>>>>>>>>>>>>>> functions current supported is average, max and min. >>>>>>>>>>>>>>>>> Lets say you have a service definition of the format >>>>>>>>>>>>>>>>> host1-service1-serviceitem1. When you enable an >>>>>>>>>>>>>>>>> average (avg) >>>>>>>>>>>>>>>>> aggregation you will automatically get the following >>>>>>>>>>>>>>>>> new service >>>>>>>>>>>>>>>>> definitions >>>>>>>>>>>>>>>>> host1-service1/H/avg-serviceitem1 >>>>>>>>>>>>>>>>> host1-service1/D/avg-serviceitem1 >>>>>>>>>>>>>>>>> host1-service1/W/avg-serviceitem1 >>>>>>>>>>>>>>>>> host1-service1/M/avg-serviceitem1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The configuration you need to achive the above average >>>>>>>>>>>>>>>>> aggregations is: >>>>>>>>>>>>>>>>> <cache> >>>>>>>>>>>>>>>>> <aggregate> >>>>>>>>>>>>>>>>> <method>avg</method> >>>>>>>>>>>>>>>>> </aggregate> >>>>>>>>>>>>>>>>> </cache> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If you like to combine it with the above descibed >>>>>>>>>>>>>>>>> purging your >>>>>>>>>>>>>>>>> configuration would look like: >>>>>>>>>>>>>>>>> <cache> >>>>>>>>>>>>>>>>> <aggregate> >>>>>>>>>>>>>>>>> <method>avg</method> >>>>>>>>>>>>>>>>> </aggregate> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <purge> >>>>>>>>>>>>>>>>> <offset>10</offset> >>>>>>>>>>>>>>>>> <period>D</period> >>>>>>>>>>>>>>>>> </purge> >>>>>>>>>>>>>>>>> </cache> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The new aggregated service definitions, >>>>>>>>>>>>>>>>> host1-service1/H/avg-serviceitem1, etc, will have >>>>>>>>>>>>>>>>> their own cache >>>>>>>>>>>>>>>>> entries and can be used in threshold configurations >>>>>>>>>>>>>>>>> and virtual >>>>>>>>>>>>>>>>> services like any other service definitions. For >>>>>>>>>>>>>>>>> example in a >>>>>>>>>>>>>>>>> threshold hours section we could define >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <hours hoursID="2"> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <hourinterval> >>>>>>>>>>>>>>>>> <from>09:00</from> >>>>>>>>>>>>>>>>> <to>12:00</to> >>>>>>>>>>>>>>>>> <threshold>host1-service1/H/avg-serviceitem1[0]*0.8</threshold> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> </hourinterval> >>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This would mean that we use the average value for >>>>>>>>>>>>>>>>> host1-service1-serviceitem1 for the period of the >>>>>>>>>>>>>>>>> last hour. >>>>>>>>>>>>>>>>> Aggregations are calculated hourly, daily, weekly and >>>>>>>>>>>>>>>>> monthly. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> By default weekends metrics are not included in the >>>>>>>>>>>>>>>>> aggrgation >>>>>>>>>>>>>>>>> calculation. This can be enabled by setting the >>>>>>>>>>>>>>>>> <useweekend>true</useweekend>: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <cache> >>>>>>>>>>>>>>>>> <aggregate> >>>>>>>>>>>>>>>>> <method>avg</method> >>>>>>>>>>>>>>>>> <useweekend>true</useweekend> >>>>>>>>>>>>>>>>> </aggregate> >>>>>>>>>>>>>>>>> …. >>>>>>>>>>>>>>>>> </cache> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This will create aggregated service definitions with >>>>>>>>>>>>>>>>> the following >>>>>>>>>>>>>>>>> name standard: >>>>>>>>>>>>>>>>> host1-service1/H/avg/weekend-serviceitem1 >>>>>>>>>>>>>>>>> host1-service1/D/avg/weekend-serviceitem1 >>>>>>>>>>>>>>>>> host1-service1/W/avg/weekend-serviceitem1 >>>>>>>>>>>>>>>>> host1-service1/M/avg/weekend-serviceitem1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You can also have multiple entries like: >>>>>>>>>>>>>>>>> <cache> >>>>>>>>>>>>>>>>> <aggregate> >>>>>>>>>>>>>>>>> <method>avg</method> >>>>>>>>>>>>>>>>> <useweekend>true</useweekend> >>>>>>>>>>>>>>>>> </aggregate> >>>>>>>>>>>>>>>>> <aggregate> >>>>>>>>>>>>>>>>> <method>max</method> >>>>>>>>>>>>>>>>> </aggregate> >>>>>>>>>>>>>>>>> …. >>>>>>>>>>>>>>>>> </cache> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So how long time will the aggregated values be kept in >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> cache? By >>>>>>>>>>>>>>>>> default we save >>>>>>>>>>>>>>>>> Hour aggregation for 25 hours >>>>>>>>>>>>>>>>> Daily aggregations for 7 days >>>>>>>>>>>>>>>>> Weekly aggregations for 5 weeks >>>>>>>>>>>>>>>>> Monthly aggregations for 1 month >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> These values can be override but they can not be lower >>>>>>>>>>>>>>>>> then the >>>>>>>>>>>>>>>>> default. Below you have an example where we save the >>>>>>>>>>>>>>>>> aggregation >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> 168 hours, 60 days and 53 weeks. >>>>>>>>>>>>>>>>> <cache> >>>>>>>>>>>>>>>>> <aggregate> >>>>>>>>>>>>>>>>> <method>avg</method> >>>>>>>>>>>>>>>>> <useweekend>true</useweekend> >>>>>>>>>>>>>>>>> <retention> >>>>>>>>>>>>>>>>> <period>H</period> >>>>>>>>>>>>>>>>> <offset>168</offset> >>>>>>>>>>>>>>>>> </retention> >>>>>>>>>>>>>>>>> <retention> >>>>>>>>>>>>>>>>> <period>D</period> >>>>>>>>>>>>>>>>> <offset>60</offset> >>>>>>>>>>>>>>>>> </retention> >>>>>>>>>>>>>>>>> <retention> >>>>>>>>>>>>>>>>> <period>W</period> >>>>>>>>>>>>>>>>> <offset>53</offset> >>>>>>>>>>>>>>>>> </retention> >>>>>>>>>>>>>>>>> </aggregate> >>>>>>>>>>>>>>>>> …. >>>>>>>>>>>>>>>>> </cache> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I hope this makes it a bit less confusing. What is >>>>>>>>>>>>>>>>> clear to me is >>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>> we need to improve the documentation in this area. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Looking forward to your feedback. >>>>>>>>>>>>>>>>> Anders >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 09/08/2014 06:02 AM, Rahul Amaram wrote: >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> I am trying to setup the bischeck plugin for our >>>>>>>>>>>>>>>>>> organization. I >>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>> configured most part of it except for the cache >>>>>>>>>>>>>>>>>> retention period. >>>>>>>>>>>>>>>>>> Here >>>>>>>>>>>>>>>>>> is what I want - I want to store every value which >>>>>>>>>>>>>>>>>> has been >>>>>>>>>>>>>>>>>> generated >>>>>>>>>>>>>>>>>> during the past 1 month. The reason being my >>>>>>>>>>>>>>>>>> threshold is >>>>>>>>>>>>>>>>>> currently >>>>>>>>>>>>>>>>>> calculated as the average of the metric value during >>>>>>>>>>>>>>>>>> the past 4 >>>>>>>>>>>>>>>>>> weeks at >>>>>>>>>>>>>>>>>> the same time of the day. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So, how do I define the cache template for this? If I >>>>>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>>> define any >>>>>>>>>>>>>>>>>> cache template, for how many days is the data kept? >>>>>>>>>>>>>>>>>> Also, how does the aggregrate function work and and >>>>>>>>>>>>>>>>>> what does the >>>>>>>>>>>>>>>>>> purge >>>>>>>>>>>>>>>>>> Maxitems signify? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I've gone through the documentation but it wasn't >>>>>>>>>>>>>>>>>> clear. Looking >>>>>>>>>>>>>>>>>> forward >>>>>>>>>>>>>>>>>> to a response. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Bischeck is one awesome plugin. Keep up the great work. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>> Rahul. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> >> >> > > -- </pre></body></html>