Determining what is causing a highloadreportedby check_load plugin

Kaplan, Andrew H. AHKAPLAN at PARTNERS.ORG
Tue Dec 7 16:25:49 CET 2010
Previous message: Determining what is causing a high loadreportedby check_load plugin
Next message: Determining what is causing a highloadreportedby check_load plugin
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi there --
 
The output shown below shows the top processes on the server:
 
439 processes: 438 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states: 19.0% user,  9.4% system,  0.0% nice, 71.0% idle
CPU1 states: 20.1% user, 13.0% system,  0.0% nice, 66.3% idle
CPU2 states: 27.1% user, 17.3% system,  0.0% nice, 55.0% idle
Mem:  2064324K av, 2013820K used,   50504K free,       0K shrd,  487764K buff
Swap: 2096472K av,   12436K used, 2084036K free                  976244K cached
 
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 2398 root      15   0  1280 1280   824 R     1.9  0.0   0:00 top
 5648 root      22   0  1196 1196  1104 S     1.3  0.0   0:00 ASMProServer
    1 root      15   0   488  484   448 S     0.0  0.0   2:28 init
    2 root      0K   0     0    0     0 SW    0.0  0.0   0:00 migration_CPU0
    3 root      0K   0     0    0     0 SW    0.0  0.0   0:00 migration_CPU1
    4 root      0K   0     0    0     0 SW    0.0  0.0   0:00 migration_CPU2
    5 root      15   0     0    0     0 SW    0.0  0.0   0:03 keventd
    6 root      34  19     0    0     0 SWN   0.0  0.0  17:52 ksoftirqd_CPU0
    7 root      34  19     0    0     0 SWN   0.0  0.0  16:39 ksoftirqd_CPU1
    8 root      34  19     0    0     0 SWN   0.0  0.0  17:33 ksoftirqd_CPU2
    9 root      15   0     0    0     0 SW    0.0  0.0  28:22 kswapd
   10 root      15   0     0    0     0 SW    0.0  0.0  42:39 bdflush
   11 root      15   0     0    0     0 SW    0.0  0.0   3:08 kupdated
   12 root      25   0     0    0     0 SW    0.0  0.0   0:00 mdrecoveryd
   18 root      16   0     0    0     0 SW    0.0  0.0   0:00 scsi_eh_0
   21 root      15   0     0    0     0 SW    0.0  0.0   4:38 kjournald
  101 root      15   0     0    0     0 SW    0.0  0.0   0:00 khubd
  265 root      15   0     0    0     0 SW    0.0  0.0   0:03 kjournald
  266 root      15   0     0    0     0 SW    0.0  0.0   3:43 kjournald
  267 root      15   0     0    0     0 SW    0.0  0.0   0:04 kjournald
  268 root      15   0     0    0     0 SW    0.0  0.0   0:01 kjournald
  269 root      15   0     0    0     0 SW    0.0  0.0   0:11 kjournald
  270 root      15   0     0    0     0 SW    0.0  0.0   4:34 kjournald
  271 root      15   0     0    0     0 SW    0.0  0.0   4:28 kjournald
  272 root      15   0     0    0     0 SW    0.0  0.0   0:08 kjournald
  273 root      15   0     0    0     0 SW    0.0  0.0   0:14 kjournald
  274 root      15   0     0    0     0 SW    0.0  0.0   0:07 kjournald
  275 root      15   0     0    0     0 SW    0.0  0.0   1:14 kjournald
  805 root      15   0   588  576   532 S     0.0  0.0   1:39 syslogd
  810 root      15   0   448  432   432 S     0.0  0.0   0:00 klogd
  830 rpc       15   0   596  572   508 S     0.0  0.0   0:04 portmap
  858 rpcuser   19   0   708  608   608 S     0.0  0.0   0:00 rpc.statd
  970 root      15   0     0    0     0 SW    0.0  0.0   0:21 rpciod
  971 root      15   0     0    0     0 SW    0.0  0.0   0:00 lockd
  999 ntp       15   0  1812 1812  1732 S     0.0  0.0   5:04 ntpd
 1022 root      15   0   772  720   632 S     0.0  0.0   0:00 ypbind
 1024 root      15   0   772  720   632 S     0.0  0.0   1:16 ypbind
 
What caught my eye was the number of processes along with the number of sleeping
processes.
I tried running the kill command on the kjournald instances, but that did not
appear to stop them.
 
Aside from rebooting the server, which can be done if necessary, what other
approach can I try?
 
 


________________________________

From: Daniel Wittenberg [mailto:daniel.wittenberg.r0ko at statefarm.com] 
Sent: Tuesday, December 07, 2010 9:11 AM
To: Nagios Users List
Subject: Re: [Nagios-users] Determining what is causing a highloadreportedby
check_load plugin



So what are the first few processes listed in top?  That should be what is
causing your load then.

 

Dan

 

 

 

From: Kaplan, Andrew H. [mailto:AHKAPLAN at PARTNERS.ORG] 
Sent: Tuesday, December 07, 2010 7:49 AM
To: Nagios Users List
Subject: Re: [Nagios-users] Determining what is causing a high loadreportedby
check_load plugin

 

Hi there --

 

The load values that are displayed in top match those for the check_load plugin.
This is the case whether the plugin

is run either automatically or interactively. The output for the uptime command
is shown below:

 

8:48am  up 153 days, 23:21,  1 user,  load average: 73.36, 73.29, 73.21

 

 

 

 

________________________________

From: Daniel Wittenberg [mailto:daniel.wittenberg.r0ko at statefarm.com] 
Sent: Monday, December 06, 2010 4:40 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Determining what is causing a high load reportedby
check_load plugin

In top, does it show the same load values?  The status of your memory shouldn't
cause the nagios plugin to report high cpu.  What does the uptime command say?
Try running the check_load script by hand on that host and verify it returns the
same results.


Dan

 

 

From: Marc Powell [mailto:lists at xodus.org] 
Sent: Monday, December 06, 2010 3:26 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Determining what is causing a high load reported by
check_load plugin

 

 

On Mon, Dec 6, 2010 at 1:50 PM, Kaplan, Andrew H. <AHKAPLAN at partners.org> wrote:

Hi there -- 

We are running Nagios 3.1.2 server, and the client that is the subject of this
e-mail is running version 2.6 of the nrpe client.

The check_load plugin, version 1.4, is indicating the past three readings are
the following: 

load average: 71.00, 71.00, 70.95 CRITICAL 

The critical threshold of the plugin has been set to the 30, 25, 20 settings. 

When I checked the client in question, the first thing I did was to run the top
command. The results are shown below: 

CPU0 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle 
CPU1 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle 
CPU2 states:  1.0% user,  4.0% system,  0.0% nice, 93.0% idle 
Mem:  2064324K av, 2032308K used,   32016K free,       0K shrd,  509924K buff 
Swap: 2096472K av,   21432K used, 2075040K free                 1035592K cached 

The one thing that I noticed was the amount of free memory was at thirty-two
megabytes. I wanted to know if that was 
what was causing the critical status to occur, or if there is something(s) else
that I should investigate.


Memory is not a factor in the load calculation, only the number of processes
running or waiting to run. For at least 15 minutes you had approximately 71
processes either running or ready to run and waiting on CPU resources. Running
top/ps was the right thing to do but you really need to do it when the problem
is occurring to see what's actually using all the CPU resources. There are far
too many reasons why load could be high but it should be easy for someone
familiar with your system to figure it out (at least generally) while
in-the-act.

--
Marc



The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20101207/0ca70913/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
What happens now with your Lotus Notes apps - do you make another costly 
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus 
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
Previous message: Determining what is causing a high loadreportedby check_load plugin
Next message: Determining what is causing a highloadreportedby check_load plugin
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list