High CPU consumption by java and redis-server
Rahul Amaram
rahul.amaram at vizury.com
Thu Sep 18 05:47:02 CEST 2014
Using indices has got down CPU usage of the java process considerably.
But won't indices create problem if data is missing in-between? For ex.
if data for 2-3 hours is missing, then won't indices be offset by that
time? Also, does bischeck support using multiple redis-instances so that
multi-core be exploited?
Regards,
Rahul.
On Thursday 18 September 2014 01:58 AM, Anders Håål wrote:
> Okay. 4-6 to seconds is absolutely to much and it may be related how
> the query by time is implement. The current search is pretty brute
> force finding the "right" time. Its not searching linearly but there
> are no "index" on time. Searching by index is much quicker and query
> by time will be related to the size. With your 6 values the search has
> to be done 6 times over a list that are 5000 items. The future idea I
> mentioned will be a sort of index for the timestamp by using a sorted
> set.
> What I would recommend you to do is to use index instead and see how
> that effect the performance. Since you use a interval of 120 sec, the
> -24H will be the same as index 720, -96H will be the same as index
> 2880, etc.
> I will try to get the time to set up an equivalent test environment.
> Keep me updated of your investigation
> Anders
>
> On 09/17/2014 09:18 PM, Rahul Amaram wrote:
>> /When it comes to your last finding I have no explanation. Just to
>> understand you compare using -24H with -10080M (-168H). Would it not
>> be better to compare -24H and -1440M. I have to get back to you on
>> this but I would need to get the result when running in cacheCli
>> since you get the time it takes,
>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4.//
>> ///
>> This was a typo. I was talking about -168H and -10080M. Also, I used
>> "bischeck cli.CacheCli" to check this. And I re-ran this now, but not
>> finding much difference between both of them (it takes about 4-6
>> seconds to retrieve the value).
>>
>> Reg. other points, I have to get back to you. On a side note, I have
>> upgraded from redis-server 2.6 to 2.8, just to rule out any version
>> performance issues.
>>
>> Thanks,
>> Rahul.
>>
>>
>> On Thursday 18 September 2014 12:19 AM, Anders Håål wrote:
>>> Hi Rahul,
>>> Looking at your threshold this means that you will retrieve max 6
>>> values, which should not be that "hard" even if its a time based
>>> query - using index is faster and is something we will look into in
>>> the future.
>>> Since you run the query every 120 sec it means that you currently
>>> have at lest 5040 items in the cache for this each service, which
>>> does not sound to bad. 10 services at least 50000 in total.
>>> What I like you to check is the following:
>>> - If you connect with some JMX client against bischeck you can see
>>> all the different timers
>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Chapter-5.
>>> The once that are related to threshold are inserting to start with
>>> but check all the different timers if some one have long execution
>>> time.
>>> - Since the its the redis-server that are consume a high level of
>>> CPU its interesting to see the configuration for redis - like the
>>> amount of memory allocated. If redis need to swap its not good.
>>> - Please check the redis log files.
>>> - You can also connect to redis with redis-cli and run command
>>> "monitor" to get a real time listing on the commands executed
>>> against redis.
>>> - Also check with top the percentage of %wa, waiting for io. How
>>> much memory do you have on the server? Only running bischeck and redis?
>>> - How much cpu is bischeck consuming? Do you see any peaks?
>>> - Also check the bischeck log to see any ERROR or WARN.
>>> - And finally - has this been the behavior from the beginning or has
>>> it increased over time? What happen if you restart bischeck (not
>>> reload)?
>>>
>>> Try to collect some more info so we can try to determine where the
>>> issue is related.
>>>
>>> When it comes to your last finding I have no explanation. Just to
>>> understand you compare using -24H with -10080M (-168H). Would it not
>>> be better to compare -24H and -1440M. I have to get back to you on
>>> this but I would need to get the result when running in cacheCli
>>> since you get the time it takes,
>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4.
>>>
>>>
>>> Regards
>>> Anders
>>>
>>>
>>>
>>>
>>>
>>> On 09/17/2014 07:13 PM, Rahul Amaram wrote:
>>>> Hi,
>>>> I am observing very high CPU consumption by the java process and
>>>> redis-server. redis-server being single threaded it self is taking
>>>> 100% CPU. I have about 10 hosts, with about 10 services each (with
>>>> one service item per service). The time interval for generation of
>>>> value is 120s. The threshold that I have defined is:
>>>>
>>>> avg($$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-24H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-96H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-168H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-336H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-504H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-672H])
>>>>
>>>>
>>>> However, currently, not more than 3 values, are available.
>>>>
>>>> I am already running this on a c3.xlarge machine (4 cores) and the
>>>> load average is quite often > 4 resulting in delay of generation of
>>>> values. Any pointers in what could be causing the high load would
>>>> be much appreciated.
>>>>
>>>> On a slightly different note, while using cli.CacheCli, retrieving
>>>> the value of a service item one week back using hours (-24H) is
>>>> considerably faster than retrieving it using minutes (-10080M).
>>>> Again, why does bischeck behave this way?
>>>>
>>>> Thanks,
>>>> Rahul.
>>>>
>>>
>>>
>>
>>
>
>
> --
>
> Ingby<http://www.ingby.com>
>
> IngbyForge<http://gforge.ingby.com>
>
> bischeck - dynamic and adaptive thresholds for Nagios<http://www.bischeck.org>
>
> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>
> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>
> Ingenjörsbyn
> Box 531
> 101 30 Stockholm
> Sweden
> www.ingby.com <http://www.ingby.com/>
> Mobil: +46 70 575 35 46
> Tele: +46 75 75 75 090
> Fax: +46 75 75 75 091
>
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20140918/c2a19d80/attachment.html>
More information about the Bischeck-users
mailing list