Possibility to avoid certain values which are way too deviant while calculating threshold
anders.haal at ingby.com
anders.haal at ingby.com
Sun Dec 28 07:14:05 CET 2014
I have looked into the topic a little bit more and I think the
capability to detect outliers is an important functionality pointed out
by Rahul.
I think we should try to get some functionality like the MAD approach
into the next version.
@Rahul - please make a feature request on this topic.
Anders
On 12/17/2014 09:57 PM, Anders Håål wrote:
> Sorry for the link -
> http://stats.stackexchange.com/questions/38001/detecting-outliers-using-standard-deviations
>
>
> The problem is not to write the code, the problem is to find a logic
> to determine which numbers to remove from the data set. What is a
> deviation from the normal difference in the set?
>
> Googling a bit more I found these definitions that may be applicable
> using stdev for your use case:
>
> *Mean and Standard Deviation Method**
> *For this outlier detection method, the mean and standard deviation of
> the residuals are calculated and compared. If a value is a certain
> number of standard deviations away from the mean, that data point is
> identified as an outlier. The specified number of standard deviations
> is called the threshold. The default value is 3.
>
> This method can fail to detect outliers because the outliers increase
> the standard deviation. The more extreme the outlier, the more the
> standard deviation is affected.
>
> *Median and Median Absolute Deviation Method (MAD)**
> *
> For this outlier detection method, the median of the residuals is
> calculated. Then, the difference is calculated between each historical
> value and this median. These differences are expressed as their
> absolute values, and a new median is calculated and multiplied by an
> empirically derived constant to yield the median absolute deviation
> (MAD). If a value is a certain number of MAD away from the median of
> the residuals, that value is classified as an outlier. The default
> threshold is 3 MAD.
>
> This method is generally more effective than the mean and standard
> deviation method for detecting outliers, but it can be too aggressive
> in classifying values that are not really extremely different. Also,
> if more than 50% of the data points have the same value, MAD is
> computed to be 0, so any value different from the residual median is
> classified as an outlier.
>
> *Median and Interquartile Deviation Method (IQD)*
>
> For this outlier detection method, the median of the residuals is
> calculated, along with the 25th percentile and the 75th percentile.
> The difference between the 25th and 75th percentile is the
> interquartile deviation (IQD). Then, the difference is calculated
> between each historical value and the residual median. If the
> historical value is a certain number of MAD away from the median of
> the residuals, that value is classified as an outlier. The default
> threshold is 2.22, which is equivalent to 3 standard deviations or MADs.
>
> This method is somewhat susceptible to influence from extreme
> outliers, but less so than the mean and standard deviation method. Box
> plots are based on this approach. The median and interquartile
> deviation method can be used for both symmetric and asymmetric data.
>
> If you find a method that you think could work, we could implement it
> together and you can verify it with your data. Can you say anything
> about the data collected?
> Anders
>
> On 12/17/2014 09:25 PM, Rahul Amaram wrote:
>> Hi Andre,
>>
>> So, I would like to remove the outlier and calculate the mean for the
>> remaining elements. Any suggestion apart from writing my own custom
>> math function? Also, I don't think that you have shared the link.
>>
>> Thanks,
>> Rahul.
>>
>> On Thursday 18 December 2014 12:55 AM, Anders Håål wrote:
>>> Hi Rahul,
>>> Its possible, but the question is what algorithm to use. The second
>>> question would also be what would you do with the remaining set,
>>> calculate a mean?
>>> When it comes to exclude a deviant value it sound close to what is
>>> called a outlier, http://en.wikipedia.org/wiki/Outlier. There are a
>>> number of mathematical solutions to this problem, but not sure which
>>> would be applicable or correct. Check this link for a discussions on
>>> the topic where one approach is using standard deviation, but from
>>> the discussion it does not sound like a statistical correct approach.
>>>
>>> If you or anyone else on this list find an good approach, I more
>>> then happy to try it. In Bischeck its possible to plug in your own
>>> functions as described in
>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-6.2
>>> so you can easily do your own testing. Using the cache browser cli
>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4
>>> you can easily test your function.
>>>
>>> Anders
>>>
>>>
>>> On 12/17/2014 03:40 PM, Rahul Amaram wrote:
>>>> Hi,
>>>>
>>>> I had a quick question. Let us say we calculate the threshold based
>>>> on the values of the past six days, one value per day. Now let us
>>>> say, out of 6 values, one of these values is way too deviant. Then
>>>> is it possible to exclude this deviant value from calculating the
>>>> threshold?
>>>>
>>>> Thanks,
>>>> Rahul.
>>>
>>>
>>
>
>
> --
>
> Ingby<http://www.ingby.com>
>
> IngbyForge<http://gforge.ingby.com>
>
> bischeck - dynamic and adaptive thresholds for Nagios<http://www.bischeck.org>
>
> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>
> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>
> Ingenjörsbyn
> Box 531
> 101 30 Stockholm
> Sweden
> www.ingby.com <http://www.ingby.com/>
> Mobil: +46 70 575 35 46
> Tele: +46 75 75 75 090
> Fax: +46 75 75 75 091
>
--
Ingby <http://www.ingby.com>
IngbyForge <http://gforge.ingby.com>
bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>
anders.haal at ingby.com<mailto:anders.haal at ingby.com>
Mjukvara genom ingenjörsmässig kreativitet och kompetens
Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax: +46 75 75 75 091
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20141228/e6fd3901/attachment.html>
More information about the Bischeck-users
mailing list