<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.6036" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=561542919-07122010><FONT face=Arial
color=#0000ff size=2>Hi there --</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=561542919-07122010><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=561542919-07122010><FONT face=Arial
color=#0000ff size=2>I ran the command syntax you suggested, and outputted it to
a file. When I checked the file, I noticed there</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=561542919-07122010><FONT face=Arial
color=#0000ff size=2>was a large amount of updatedb and slocate instances that
were running going back to August of this year. </FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=561542919-07122010><FONT face=Arial
color=#0000ff size=2>When I tried to kill those processes, I ran into the same
problem that I encountered with the kjournald instances.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=561542919-07122010><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=561542919-07122010><FONT face=Arial
color=#0000ff size=2>I did some further investigating, and it turns out a high
number of the updatedb and slocate processes are</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=561542919-07122010><FONT face=Arial
color=#0000ff size=2>symptomatic of a corrupted filesystem. Accordingly, I
rebooted the server and had it run fsck on all filesystems.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=561542919-07122010><FONT face=Arial
color=#0000ff size=2>The server is now up, and I will monitor it for the next
week to see if the problem returns.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=561542919-07122010><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=561542919-07122010><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Rick Mangus
[mailto:rick.mangus+nagios@gmail.com] <BR><B>Sent:</B> Tuesday, December 07,
2010 10:49 AM<BR><B>To:</B> Nagios Users List<BR><B>Subject:</B> Re:
[Nagios-users] Determining what is causing a highloadreportedby check_load
plugin<BR></FONT><BR></DIV>
<DIV></DIV>Kjournald is needed for journalling on ext3 filesystems. Be
glad you didn't manage to kill them.<BR><BR>To find something that is running
many many instances, try this: "ps -ax -o cmd | sort | uniq -c | sort
-n"<BR><BR>The output will be like so:<BR> 3
[kjournald]<BR> 3 [sh]
<defunct><BR> 5
-bash<BR> 7 crond<BR><BR>The column on the left is
the number of processes with that command line. I occasionally have 10,000
instances of nsca that simply need to be killed. Do let us know what you
find!<BR><BR>--Rick<BR><BR>
<DIV class=gmail_quote>On Tue, Dec 7, 2010 at 9:25 AM, Kaplan, Andrew H. <SPAN
dir=ltr><<A
href="mailto:AHKAPLAN@partners.org">AHKAPLAN@partners.org</A>></SPAN>
wrote:<BR>
<BLOCKQUOTE class=gmail_quote
style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">
<DIV lang=EN-US link="blue" vlink="purple">
<DIV dir=ltr align=left><SPAN><FONT face=Arial color=#0000ff size=2>Hi there
--</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN><FONT face=Arial color=#0000ff size=2>The output
shown below shows the top processes on the server:</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN><FONT face=Arial color=#0000ff size=2>439
processes: 438 sleeping, 1 running, 0 zombie, 0 stopped<BR>CPU0 states: 19.0%
user, 9.4% system, 0.0% nice, 71.0% idle<BR>CPU1 states: 20.1%
user, 13.0% system, 0.0% nice, 66.3% idle<BR>CPU2 states: 27.1% user,
17.3% system, 0.0% nice, 55.0% idle<BR>Mem: 2064324K av, 2013820K
used, 50504K free, 0K
shrd, 487764K buff<BR>Swap: 2096472K av, 12436K used,
2084036K
free
976244K cached</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV dir=ltr align=left><SPAN><FONT face=Arial color=#0000ff size=2> PID
USER PRI NI SIZE RSS SHARE STAT %CPU
%MEM TIME COMMAND<BR> 2398 root
15 0 1280 1280 824 R
1.9 0.0 0:00 top<BR> 5648
root 22 0 1196 1196 1104
S 1.3 0.0 0:00
ASMProServer<BR> 1 root
15 0 488 484 448
S 0.0 0.0 2:28
init<BR> 2 root 0K
0 0 0 0
SW 0.0 0.0 0:00
migration_CPU0<BR> 3 root
0K 0 0
0 0 SW 0.0 0.0
0:00 migration_CPU1<BR> 4 root
0K 0 0
0 0 SW 0.0 0.0
0:00 migration_CPU2<BR> 5 root
15 0 0
0 0 SW 0.0 0.0
0:03 keventd<BR> 6 root
34 19 0
0 0 SWN 0.0 0.0 17:52
ksoftirqd_CPU0<BR> 7 root
34 19 0
0 0 SWN 0.0 0.0 16:39
ksoftirqd_CPU1<BR> 8 root
34 19 0
0 0 SWN 0.0 0.0 17:33
ksoftirqd_CPU2<BR> 9 root
15 0 0
0 0 SW 0.0 0.0 28:22
kswapd<BR> 10 root 15
0 0 0 0
SW 0.0 0.0 42:39 bdflush<BR> 11
root 15 0
0 0 0 SW 0.0
0.0 3:08 kupdated<BR> 12
root 25 0
0 0 0 SW 0.0
0.0 0:00 mdrecoveryd<BR> 18
root 16 0
0 0 0 SW 0.0
0.0 0:00 scsi_eh_0<BR> 21
root 15 0
0 0 0 SW 0.0
0.0 4:38 kjournald<BR> 101
root 15 0
0 0 0 SW 0.0
0.0 0:00 khubd<BR> 265 root
15 0 0
0 0 SW 0.0 0.0
0:03 kjournald<BR> 266 root 15
0 0 0 0
SW 0.0 0.0 3:43 kjournald<BR> 267
root 15 0
0 0 0 SW 0.0
0.0 0:04 kjournald<BR> 268
root 15 0
0 0 0 SW 0.0
0.0 0:01 kjournald<BR> 269
root 15 0
0 0 0 SW 0.0
0.0 0:11 kjournald<BR> 270
root 15 0
0 0 0 SW 0.0
0.0 4:34 kjournald<BR> 271
root 15 0
0 0 0 SW 0.0
0.0 4:28 kjournald<BR> 272
root 15 0
0 0 0 SW 0.0
0.0 0:08 kjournald<BR> 273
root 15 0
0 0 0 SW 0.0
0.0 0:14 kjournald<BR> 274
root 15 0
0 0 0 SW 0.0
0.0 0:07 kjournald<BR> 275
root 15 0
0 0 0 SW 0.0
0.0 1:14 kjournald<BR> 805
root 15 0 588
576 532 S 0.0 0.0 1:39
syslogd<BR> 810 root 15
0 448 432 432 S
0.0 0.0 0:00 klogd<BR> 830
rpc 15 0 596
572 508 S 0.0 0.0 0:04
portmap<BR> 858 rpcuser 19 0
708 608 608 S 0.0
0.0 0:00 rpc.statd<BR> 970
root 15 0
0 0 0 SW 0.0
0.0 0:21 rpciod<BR> 971 root
15 0 0
0 0 SW 0.0 0.0
0:00 lockd<BR> 999 ntp
15 0 1812 1812 1732 S
0.0 0.0 5:04 ntpd<BR> 1022
root 15 0 772
720 632 S 0.0 0.0 0:00
ypbind<BR> 1024 root 15
0 772 720 632 S
0.0 0.0 1:16 ypbind</FONT></SPAN></DIV>
<DIV><SPAN><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV><SPAN><FONT face=Arial color=#0000ff size=2>What caught my eye was the
number of processes along with the number of sleeping
processes.</FONT></SPAN></DIV>
<DIV><SPAN><FONT face=Arial color=#0000ff size=2>I tried running the kill
command on the kjournald instances, but that did not appear to stop
them.</FONT></SPAN></DIV>
<DIV><SPAN><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV><SPAN><FONT face=Arial color=#0000ff size=2>Aside from rebooting the
server, which can be done if necessary, what other approach can I
try?</FONT></SPAN></DIV>
<DIV><SPAN><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV><SPAN><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><BR></DIV><BR>
<DIV lang=en-us dir=ltr align=left>
<HR>
<FONT face=Tahoma size=2>
<DIV class=im><B>From:</B> Daniel Wittenberg [mailto:<A
href="mailto:daniel.wittenberg.r0ko@statefarm.com"
target=_blank>daniel.wittenberg.r0ko@statefarm.com</A>] <BR></DIV><B>Sent:</B>
Tuesday, December 07, 2010 9:11 AM
<DIV>
<DIV></DIV>
<DIV class=h5><BR><B>To:</B> Nagios Users List<BR><B>Subject:</B> Re:
[Nagios-users] Determining what is causing a highloadreportedby check_load
plugin<BR></DIV></DIV></FONT><BR></DIV>
<DIV>
<DIV></DIV>
<DIV class=h5>
<DIV></DIV>
<DIV>
<P class=MsoNormal><SPAN style="FONT-SIZE: 11pt; COLOR: rgb(31,73,125)">So
what are the first few processes listed in top? That should be what is
causing your load then.</SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; COLOR: rgb(31,73,125)"></SPAN> </P>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; COLOR: rgb(31,73,125)">Dan</SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; COLOR: rgb(31,73,125)"></SPAN> </P>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; COLOR: rgb(31,73,125)"></SPAN> </P>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; COLOR: rgb(31,73,125)"></SPAN> </P>
<DIV>
<DIV
style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: rgb(181,196,223) 1pt solid; PADDING-LEFT: 0in; PADDING-BOTTOM: 0in; BORDER-LEFT: medium none; PADDING-TOP: 3pt; BORDER-BOTTOM: medium none">
<P class=MsoNormal><B><SPAN style="FONT-SIZE: 10pt">From:</SPAN></B><SPAN
style="FONT-SIZE: 10pt"> Kaplan, Andrew H. [mailto:<A
href="mailto:AHKAPLAN@PARTNERS.ORG" target=_blank>AHKAPLAN@PARTNERS.ORG</A>]
<BR><B>Sent:</B> Tuesday, December 07, 2010 7:49 AM<BR><B>To:</B> Nagios Users
List<BR><B>Subject:</B> Re: [Nagios-users] Determining what is causing a high
loadreportedby check_load plugin</SPAN></P></DIV></DIV>
<P class=MsoNormal> </P>
<P class=MsoNormal><SPAN style="FONT-SIZE: 10pt; COLOR: blue">Hi there
--</SPAN></P>
<P class=MsoNormal> </P>
<P class=MsoNormal><SPAN style="FONT-SIZE: 10pt; COLOR: blue">The load values
that are displayed in top match those for the check_load plugin. This is the
case whether the plugin</SPAN></P>
<P class=MsoNormal><SPAN style="FONT-SIZE: 10pt; COLOR: blue">is run either
automatically or interactively. The output for the uptime command is shown
below:</SPAN></P>
<P class=MsoNormal> </P>
<DIV>
<P class=MsoNormal><SPAN style="FONT-SIZE: 10pt; COLOR: blue">8:48am up
153 days, 23:21, 1 user, load average: 73.36, 73.29,
73.21</SPAN></P></DIV>
<DIV>
<P class=MsoNormal> </P></DIV>
<DIV>
<P class=MsoNormal> </P></DIV>
<P class=MsoNormal> </P>
<P class=MsoNormal> </P>
<DIV class=MsoNormal style="TEXT-ALIGN: center" align=center>
<HR align=center width="100%" SIZE=2>
</DIV>
<P class=MsoNormal style="MARGIN-BOTTOM: 12pt"><B><SPAN
style="FONT-SIZE: 10pt">From:</SPAN></B><SPAN style="FONT-SIZE: 10pt"> Daniel
Wittenberg [mailto:<A href="mailto:daniel.wittenberg.r0ko@statefarm.com"
target=_blank>daniel.wittenberg.r0ko@statefarm.com</A>] <BR><B>Sent:</B>
Monday, December 06, 2010 4:40 PM<BR><B>To:</B> Nagios Users
List<BR><B>Subject:</B> Re: [Nagios-users] Determining what is causing a high
load reportedby check_load plugin</SPAN></P>
<P class=MsoNormal><SPAN style="FONT-SIZE: 11pt; COLOR: rgb(31,73,125)">In
top, does it show the same load values? The status of your memory
shouldn’t cause the nagios plugin to report high cpu. What does the
uptime command say? Try running the check_load script by hand on that
host and verify it returns the same results.</SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; COLOR: rgb(31,73,125)"><BR>Dan</SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; COLOR: rgb(31,73,125)"></SPAN> </P>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; COLOR: rgb(31,73,125)"></SPAN> </P>
<DIV
style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: rgb(181,196,223) 1pt solid; PADDING-LEFT: 0in; PADDING-BOTTOM: 0in; BORDER-LEFT: medium none; PADDING-TOP: 3pt; BORDER-BOTTOM: medium none">
<P class=MsoNormal><B><SPAN style="FONT-SIZE: 10pt">From:</SPAN></B><SPAN
style="FONT-SIZE: 10pt"> Marc Powell [mailto:<A href="mailto:lists@xodus.org"
target=_blank>lists@xodus.org</A>] <BR><B>Sent:</B> Monday, December 06, 2010
3:26 PM<BR><B>To:</B> Nagios Users List<BR><B>Subject:</B> Re: [Nagios-users]
Determining what is causing a high load reported by check_load
plugin</SPAN></P></DIV>
<P class=MsoNormal> </P>
<P class=MsoNormal style="MARGIN-BOTTOM: 12pt"> </P>
<DIV>
<P class=MsoNormal>On Mon, Dec 6, 2010 at 1:50 PM, Kaplan, Andrew H. <<A
href="mailto:AHKAPLAN@partners.org"
target=_blank>AHKAPLAN@partners.org</A>> wrote:</P>
<DIV>
<P><SPAN style="FONT-SIZE: 10pt">Hi there --</SPAN> </P>
<P><SPAN style="FONT-SIZE: 10pt">We are running Nagios 3.1.2 server, and the
client that is the subject of this e-mail is running version 2.6 of the nrpe
client.</SPAN></P>
<P><SPAN style="FONT-SIZE: 10pt">The check_load plugin, version 1.4, is
indicating the past three readings are the following:</SPAN> </P>
<P><SPAN style="FONT-SIZE: 10pt">load average: 71.00, 71.00, 70.95
CRITICAL</SPAN> </P>
<P><SPAN style="FONT-SIZE: 10pt">The critical threshold of the plugin has been
set to the 30, 25, 20 settings.</SPAN> </P>
<P><SPAN style="FONT-SIZE: 10pt">When I checked the client in question, the
first thing I did was to run the top command. The results are shown
below:</SPAN> </P>
<P><SPAN style="FONT-SIZE: 10pt">CPU0 states: 0.0% user, 0.0%
system, 0.0% nice, 100.0% idle</SPAN> <BR><SPAN
style="FONT-SIZE: 10pt">CPU1 states: 0.0% user, 0.0% system,
0.0% nice, 100.0% idle</SPAN> <BR><SPAN style="FONT-SIZE: 10pt">CPU2
states: 1.0% user, 4.0% system, 0.0% nice, 93.0% idle</SPAN>
<BR><SPAN style="FONT-SIZE: 10pt">Mem: 2064324K av, 2032308K
used, 32016K free, 0K
shrd, 509924K buff</SPAN> <BR><SPAN style="FONT-SIZE: 10pt">Swap:
2096472K av, 21432K used, 2075040K
free
1035592K cached</SPAN> </P>
<P><SPAN style="FONT-SIZE: 10pt">The one thing that I noticed was the amount
of free memory was at thirty-two megabytes. I wanted to know if that
was</SPAN> <BR><SPAN style="FONT-SIZE: 10pt">what was causing the critical
status to occur, or if there is something(s) else that I should
investigate.</SPAN></P></DIV>
<DIV>
<P class=MsoNormal style="MARGIN-BOTTOM: 12pt"><BR>Memory is not a factor in
the load calculation, only the number of processes running or waiting to run.
For at least 15 minutes you had approximately 71 processes either running or
ready to run and waiting on CPU resources. Running top/ps was the right thing
to do but you really need to do it when the problem is occurring to see what's
actually using all the CPU resources. There are far too many reasons why load
could be high but it should be easy for someone familiar with your system to
figure it out (at least generally) while
in-the-act.<BR><BR>--<BR>Marc</P></DIV></DIV>
<P class=MsoNormal><SPAN style="FONT-FAMILY: 'Courier New'"><BR><BR>The
information in this e-mail is intended only for the person to whom it
is<BR>addressed. If you believe this e-mail was sent to you in error and the
e-mail<BR>contains patient information, please contact the Partners Compliance
HelpLine at<BR><A href="http://www.partners.org/complianceline"
target=_blank>http://www.partners.org/complianceline</A> . If the e-mail was
sent to you in error<BR>but does not contain patient information, please
contact the sender and properly<BR>dispose of the
e-mail.</SPAN></P></DIV></DIV></DIV></DIV><BR>------------------------------------------------------------------------------<BR>What
happens now with your Lotus Notes apps - do you make another
costly<BR>upgrade, or settle for being marooned without product support? Time
to move<BR>off Lotus Notes and onto the cloud with Force.com, apps are easier
to build,<BR>use, and manage than apps on traditional platforms. Sign up for
the Lotus<BR>Notes Migration Kit to learn more. <A
href="http://p.sf.net/sfu/salesforce-d2d"
target=_blank>http://p.sf.net/sfu/salesforce-d2d</A><BR>_______________________________________________<BR>Nagios-users
mailing list<BR><A
href="mailto:Nagios-users@lists.sourceforge.net">Nagios-users@lists.sourceforge.net</A><BR><A
href="https://lists.sourceforge.net/lists/listinfo/nagios-users"
target=_blank>https://lists.sourceforge.net/lists/listinfo/nagios-users</A><BR>:::
Please include Nagios version, plugin version (-v) and OS when reporting any
issue.<BR>::: Messages without supporting info will risk being sent to
/dev/null<BR></BLOCKQUOTE></DIV><BR></BODY></HTML>