<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=Content-Type content="text/html; charset=iso-8859-1"> <META content="MSHTML 6.00.2900.5626" name=GENERATOR></HEAD> <BODY> <DIV dir=ltr align=left>Hi Sascha,</DIV> <DIV dir=ltr align=left> </DIV> <DIV dir=ltr align=left>It seems that for every host, 3 processes are launched to do the host ping check: sh, ping, and nagios. I currently have ~57 hosts that are in an down state and have been acknowledged as out of service. I would assume 57*3 plus the 30 second timeout could cause this many processes at the same time.</DIV> <DIV dir=ltr align=left> </DIV> <DIV dir=ltr align=left>I guess that brings me to my next question. I could disable active host checks for these out of service machines which would most likely alleviate my warnings about the amount of processes, but would I have to re-enable them once the machines are brought back up? I currently just acknowledge the problem and leave a comment when a machine is put out of service, but this means that it will be back at some point. When it does come back, acknowledgement is gone and regular checks are still happening. Does anyone know of a better way to do this?</DIV> <DIV dir=ltr align=left> </DIV> <DIV dir=ltr align=left>Thanks so much,</DIV> <DIV dir=ltr align=left> </DIV> <DIV dir=ltr align=left>Ryan Gravlin</DIV> <DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left> <HR tabIndex=-1> From: nagios-users-bounces@lists.sourceforge.net [mailto:nagios-users-bounces@lists.sourceforge.net] On Behalf Of Sascha.Runschke@gfkl.com Sent: Tuesday, September 02, 2008 11:01 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Antwort: Default Nagios process self-check </DIV> <DIV></DIV> <TT>nagios-users-bounces@lists.sourceforge.net schrieb am 02.09.2008 15:37:47: > # of Hosts Monitored: 322 > # of Services Monitored: 35 > > The localhost.cfg comes with a default process check with the values > 250+ for warnings and 400+ for critical. Usually about twice an > hour from checking the event log I get this message: > > [09-02-2008 07:02:48] SERVICE ALERT: NAGIOS;Total Processes;WARNING; > SOFT;1;PROCS WARNING: 370 processes with STATE = RSZDT > > It seems to me the machine itself is powerful enough to execute this > many checks without even breaking a sweat. Were these defaults > configured in the thinking that there should never be that many processes? > > I'm by no means a Linux or Nagios expert and I was hoping someone > could explain more of the thinking behind this check than what I > see. I can obviously just bump the numbers up but I want to make > sure that I'm not ignoring something obvious that may have unwanted > results after the fact. Should I use these numbers I see here as > the basis for my new thresholds? </TT> <TT>These thresholds were never meant to be any upper limit, the maximum number</TT> <TT>of concurrent checks your box can handle solely depends on your hardware.</TT> <TT>See it more like an "if you have a nagios installation which produces</TT> <TT>that many concurrent checks - then you should know by now how to</TT> <TT>change this behaviour" ;-)</TT> <TT>But then - I fail to see how your setup with 322 host and only 35(?) service</TT> <TT>checks could produce that many processes. Maybe it'll be a good idea to</TT> <TT>doublecheck what's going on there.</TT> <TT>S</TT> GFKL Financial Services AG Vorstand: Dr. Peter Jänsch (Vors.), Jürgen Baltes, Dr. Till Ergenzinger, Dr. Tom Haverkamp Vorsitzender des Aufsichtsrats: Dr. Georg F. Thoma Sitz: Limbecker Platz 1, 45127 Essen, Amtsgericht Essen, HRB 13522</BODY></HTML>