Specifying the retention period
Anders Håål
anders.haal at ingby.com
Mon Sep 8 10:34:10 CEST 2014
Hi Rahul,
Thanks for the question and feedback on the documentation. Great to hear
that you think Bischeck is awesome. If you do not understand how it
works by reading the documentation you are probably not alone, and we
should consider it a documentation bug.
In 1.0.0 we introduce the concept that you asking about and it really
two different independent features.
Lets start with cache purging.
Collected monitoring data, metrics, are kept in the cache (redis from
1.0.0) as a linked lists. There is one linked list per service
definition, like host1-service1-serviceitem1. Prior to 1.0.0 all the
linked lists had the same size that was defined with the property
lastStatusCacheSize. But in 1.0.0 we made that configurable so it could
be defined per service definition.
To enable individual cache configurations we added a section called
<cache> in the serviceitem section of the bischeck.xml. Like many other
configuration options in 1.0.0 the cache section could have the specific
values or point to a template that could be shared.
To manage the size of the cache , or to be more specific the linked list
size, we defined the <purge> section. The purge section can have two
different configurations. The first is defining the max size of the
cache linked list.
<cache>
<purge>
<maxcount>1000</maxcount>
</purge>
</cache>
The second options is to define the “time to live” for the metrics in
the cache.
<cache>
<purge>
<offset>10</offset>
<period>D</period>
</purge>
</cache>
In the above example we set the time to live to 10 days. So any metrics
older then this period will be removed. The period can have the
following values:
H - hours
D - days
W - weeks
Y - year
The two option are mutual exclusive. You have to chose one for each
serviceitem or cache template.
If no cache directive is define for a serviceitem the property
lastStatusCacheSize will be used. It's default value is 500.
Hopefully this explains the cache purging.
The next question was related to aggregations which has nothing to do
with purging, but it's configured in the same <cache> section. The idea
with aggregations was to create an automatic way to aggregate metrics on
the level of an hour, day, week and month. The aggregation functions
current supported is average, max and min.
Lets say you have a service definition of the format
host1-service1-serviceitem1. When you enable an average (avg)
aggregation you will automatically get the following new service
definitions
host1-service1/H/avg-serviceitem1
host1-service1/D/avg-serviceitem1
host1-service1/W/avg-serviceitem1
host1-service1/M/avg-serviceitem1
The configuration you need to achive the above average aggregations is:
<cache>
<aggregate>
<method>avg</method>
</aggregate>
</cache>
If you like to combine it with the above descibed purging your
configuration would look like:
<cache>
<aggregate>
<method>avg</method>
</aggregate>
<purge>
<offset>10</offset>
<period>D</period>
</purge>
</cache>
The new aggregated service definitions,
host1-service1/H/avg-serviceitem1, etc, will have their own cache
entries and can be used in threshold configurations and virtual services
like any other service definitions. For example in a threshold hours
section we could define
<hours hoursID="2">
<hourinterval>
<from>09:00</from>
<to>12:00</to>
<threshold>host1-service1/H/avg-serviceitem1[0]*0.8</threshold>
</hourinterval>
...
This would mean that we use the average value for
host1-service1-serviceitem1 for the period of the last hour.
Aggregations are calculated hourly, daily, weekly and monthly.
By default weekends metrics are not included in the aggrgation
calculation. This can be enabled by setting the
<useweekend>true</useweekend>:
<cache>
<aggregate>
<method>avg</method>
<useweekend>true</useweekend>
</aggregate>
….
</cache>
This will create aggregated service definitions with the following name
standard:
host1-service1/H/avg/weekend-serviceitem1
host1-service1/D/avg/weekend-serviceitem1
host1-service1/W/avg/weekend-serviceitem1
host1-service1/M/avg/weekend-serviceitem1
You can also have multiple entries like:
<cache>
<aggregate>
<method>avg</method>
<useweekend>true</useweekend>
</aggregate>
<aggregate>
<method>max</method>
</aggregate>
….
</cache>
So how long time will the aggregated values be kept in the cache? By
default we save
Hour aggregation for 25 hours
Daily aggregations for 7 days
Weekly aggregations for 5 weeks
Monthly aggregations for 1 month
These values can be override but they can not be lower then the default.
Below you have an example where we save the aggregation for 168 hours,
60 days and 53 weeks.
<cache>
<aggregate>
<method>avg</method>
<useweekend>true</useweekend>
<retention>
<period>H</period>
<offset>168</offset>
</retention>
<retention>
<period>D</period>
<offset>60</offset>
</retention>
<retention>
<period>W</period>
<offset>53</offset>
</retention>
</aggregate>
….
</cache>
I hope this makes it a bit less confusing. What is clear to me is that
we need to improve the documentation in this area.
Looking forward to your feedback.
Anders
On 09/08/2014 06:02 AM, Rahul Amaram wrote:
> Hi,
> I am trying to setup the bischeck plugin for our organization. I have
> configured most part of it except for the cache retention period. Here
> is what I want - I want to store every value which has been generated
> during the past 1 month. The reason being my threshold is currently
> calculated as the average of the metric value during the past 4 weeks at
> the same time of the day.
>
> So, how do I define the cache template for this? If I don't define any
> cache template, for how many days is the data kept?
> Also, how does the aggregrate function work and and what does the purge
> Maxitems signify?
>
> I've gone through the documentation but it wasn't clear. Looking forward
> to a response.
>
> Bischeck is one awesome plugin. Keep up the great work.
>
> Regards,
> Rahul.
>
--
Ingby<http://www.ingby.com>
IngbyForge<http://gforge.ingby.com>
bischeck - dynamic and adaptive thresholds for Nagios
<http://www.bischeck.org>
anders.haal at ingby.com<mailto:anders.haal at ingby.com>
Mjukvara genom ingenjörsmässig kreativitet och kompetens
Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax: +46 75 75 75 091
More information about the Bischeck-users
mailing list