<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2900.5626" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=950580717-02092008><FONT face=Arial
size=2>Hi Sascha,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=950580717-02092008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=950580717-02092008><FONT face=Arial
size=2>It seems that for every host, 3 processes are launched to do the host
ping check: sh, ping, and nagios. I currently have ~57 hosts that are in
an down state and have been acknowledged as out of service. I would assume
57*3 plus the 30 second timeout could cause this many processes at the same
time.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=950580717-02092008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=950580717-02092008><FONT face=Arial size=2>I
guess that brings me to my next question. I could disable active host
checks for these out of service machines which would most likely alleviate my
warnings about the amount of processes, but would I have to re-enable them once
the machines are brought back up? I currently just acknowledge the problem
and leave a comment when a machine is put out of service, but this means that it
will be back at some point. When it does come back, acknowledgement is
gone and regular checks are still happening. Does anyone know of a better
way to do this?</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=950580717-02092008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=950580717-02092008><FONT face=Arial
size=2>Thanks so much,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=950580717-02092008><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=950580717-02092008><FONT face=Arial
size=2>Ryan Gravlin</FONT></SPAN></DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> nagios-users-bounces@lists.sourceforge.net
[mailto:nagios-users-bounces@lists.sourceforge.net] <B>On Behalf Of
</B>Sascha.Runschke@gfkl.com<BR><B>Sent:</B> Tuesday, September 02, 2008 11:01
AM<BR><B>To:</B> nagios-users@lists.sourceforge.net<BR><B>Subject:</B>
[Nagios-users] Antwort: Default Nagios process self-check<BR></FONT><BR></DIV>
<DIV></DIV><BR><TT><FONT size=2>nagios-users-bounces@lists.sourceforge.net
schrieb am 02.09.2008 15:37:47:<BR><BR>> # of Hosts Monitored: 322<BR>> #
of Services Monitored: 35<BR>> <BR>> The localhost.cfg comes with a
default process check with the values<BR>> 250+ for warnings and 400+ for
critical. Usually about twice an <BR>> hour from checking the event log
I get this message:<BR>> <BR>> [09-02-2008 07:02:48] SERVICE ALERT:
NAGIOS;Total Processes;WARNING;<BR>> SOFT;1;PROCS WARNING: 370 processes with
STATE = RSZDT<BR>> <BR>> It seems to me the machine itself is powerful
enough to execute this<BR>> many checks without even breaking a sweat.
Were these defaults <BR>> configured in the thinking that there should
never be that many processes?<BR>> <BR>> I'm by no means a Linux or Nagios
expert and I was hoping someone <BR>> could explain more of the thinking
behind this check than what I <BR>> see. I can obviously just bump the
numbers up but I want to make <BR>> sure that I'm not ignoring something
obvious that may have unwanted <BR>> results after the fact. Should I
use these numbers I see here as <BR>> the basis for my new
thresholds?<BR></FONT></TT><BR><TT><FONT size=2>These thresholds were never
meant to be any upper limit, the maximum number</FONT></TT> <BR><TT><FONT
size=2>of concurrent checks your box can handle solely depends on your
hardware.</FONT></TT> <BR><TT><FONT size=2>See it more like an "if you have a
nagios installation which produces</FONT></TT> <BR><TT><FONT size=2>that many
concurrent checks - then you should know by now how to</FONT></TT> <BR><TT><FONT
size=2>change this behaviour" ;-)</FONT></TT> <BR><BR><TT><FONT size=2>But then
- I fail to see how your setup with 322 host and only 35(?) service</FONT></TT>
<BR><TT><FONT size=2>checks could produce that many processes. Maybe it'll be a
good idea to</FONT></TT> <BR><TT><FONT size=2>doublecheck what's going on
there.</FONT></TT> <BR><BR><TT><FONT size=2>S</FONT></TT> <BR><BR><BR><SPAN
style="FONT-SIZE: 10pt; COLOR: #000000; FONT-FAMILY: sans-serif,helvetica">GFKL
Financial Services AG</SPAN><BR><SPAN
style="FONT-SIZE: 10pt; COLOR: #000000; FONT-FAMILY: sans-serif,helvetica">Vorstand:
Dr. Peter Jänsch (Vors.), Jürgen Baltes, Dr. Till Ergenzinger, Dr. Tom
Haverkamp</SPAN><BR><SPAN
style="FONT-SIZE: 10pt; COLOR: #000000; FONT-FAMILY: sans-serif,helvetica">Vorsitzender
des Aufsichtsrats: Dr. Georg F. Thoma</SPAN><BR><SPAN
style="FONT-SIZE: 10pt; COLOR: #000000; FONT-FAMILY: sans-serif,helvetica">Sitz:
Limbecker Platz 1, 45127 Essen, Amtsgericht Essen, HRB
13522</SPAN></BODY></HTML>