Average Check latency and execution time growth - 3.2.3

Stuart Browne stuart.browne at ausregistry.com.au
Mon Oct 3 05:36:06 CEST 2011


Hi,

I know this topic has been covered many times, but I've tried those tweaks and I have the remaining issue.

After a few days, the latency on checks explodes.  It goes along quite happily with small values, then after (about) 3 days, the values rise quite sharply.  I've recently been graphing performance statistics (nagiostats, mrtg) and as you can see by the two attachments (day, week), it's rather surprising.

We restart Nagios every few days (for other reasons) so thankfully the issue never gets completely out of control, but as you can see, it gets a bit crazy.

I can't think of any combination of settings that would cause such growth after such a long period of time.  Does anybody have any knowledge as to why it would suddenly increase after running for days without issue?

Basic Nagios system stats:
	2 x dual-core Xeon 5160 (3Ghz)
	6GB Memory
	4 x SAS, RAID1 (hardware, BBU, LVM over RAID1)
	RHEL5, fully patched
	Load average between 0.5 and 3.2

'nagios -s /etc/nagios/nagios.cfg' output (trimmed):

HOST SCHEDULING INFORMATION
---------------------------
Total hosts:                     252
Total scheduled hosts:           252
Host inter-check delay method:   SMART
Average host check interval:     300.00 sec
Host inter-check delay:          1.19 sec
Max host check spread:           30 min
First scheduled check:           Mon Oct  3 14:31:17 2011
Last scheduled check:            Mon Oct  3 14:36:15 2011


SERVICE SCHEDULING INFORMATION
-------------------------------
Total services:                     1575
Total scheduled services:           1386
Service inter-check delay method:   SMART
Average service check interval:     878.40 sec
Inter-check delay:                  0.63 sec
Interleave factor method:           SMART
Average services per host:          6.25
Service interleave factor:          6
Max service check spread:           30 min
First scheduled check:              Mon Oct  3 14:33:43 2011
Last scheduled check:               Mon Oct  3 14:48:21 2011

CHECK PROCESSING INFORMATION
----------------------------
Check result reaper interval:       5 sec
Max concurrent service checks:      Unlimited


PERFORMANCE SUGGESTIONS
-----------------------
I have no suggestions - things look okay.

Stuart J. Browne
Senior Linux Administrator
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nagios-a-day[1].png
Type: image/png
Size: 2551 bytes
Desc: nagios-a-day[1].png
URL: <https://www.monitoring-lists.org/archive/users/attachments/20111003/5b9b67b7/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nagios-a-week[1].png
Type: image/png
Size: 2368 bytes
Desc: nagios-a-week[1].png
URL: <https://www.monitoring-lists.org/archive/users/attachments/20111003/5b9b67b7/attachment-0001.png>
-------------- next part --------------
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list