[Apan-users] Re: nagios and apan cause server to crash...
Matthew Wilson
matthewwilson at dsl.pipex.com
Tue Oct 14 16:15:20 CEST 2003
I am seeing this on RH9, bash 2.05b.0(1)-release
(i386-redhat-linux-gnu), kernel 2.4.20-20.9
Matthew W
DC-Sat.net
On Tue, 2003-10-14 at 15:03, Fredrik Wänglund wrote:
> Is this a RH9 problem? Or is it related to a secific version of the
> kernel, bash, ... ?
> What other OS-versions have this problem?
>
> /FredrikW
>
>
>
> Igor Kurtovic wrote:
>
> > step back to RH 8.0 ..
> >
> > i had similar probs, the only difference was a daily crash :P
> >
> > even with changed reaper-frequency there was no improvement to see.
> > after getting it back on RH 8.0 all is fine again.
> >
> > 300 hosts
> > 1500 services
> > 400 apan's
> > 150 mrtg-hosts
> >
> > all on this box:
> >
> > Dual Xeon III 1 Ghz
> > 2 GB RAM
> >
> > never had any perfomrance issues or stability probs b4 going onto RH 9.0
> >
> > Regards, Igor
> >
> >
> >
> > On Tue, 2003-10-14 at 09:25, Fredrik Wänglund wrote:
> >
> >>I have service_reaper_frequency=3, and I remember that before I changed
> >>it from the default, my load used to be 8-10.
> >>
> >>/FredrikW
> >>
> >>Evan Weston wrote:
> >>
> >>>I was having a simmilar problem under Redhat 9 on a pIII 900 512 meg ram.
> >>>
> >>>I set 'service_reaper_frequency=4' instead of the default 'service_reaper_frequency=10' in the 'nagios.cfg' file and its completely stable now.
> >>>
> >>>Evan Weston
> >>>
> >>>
> >>>-----Original Message-----
> >>>From: Fredrik Wänglund [mailto:fredrik.wanglund at datavis.se]
> >>>Sent: Tuesday, 14 October 2003 4:21 PM
> >>>To: jeff vier
> >>>Cc: Matthew Wilson; nagios-users; Apan-users List
> >>>Subject: Re: [Apan-users] Re: [Nagios-users] nagios and apan cause server to crash...
> >>>
> >>>What platform/version are you running on?
> >>>
> >>>I'm running without any problem under RedHat 8.0 on a PIII 1400MHz with
> >>>170 hosts, 200 apan-services and 300 'normal' services.
> >>>My system-load stays between 1 and 2, CPU is mainly >80% idle
> >>>
> >>>jeff vier wrote:
> >>>
> >>>
> >>>
> >>>>I'm having the same problem here.
> >>>>
> >>>>I have been capturing dumps of the top command, pulling only active
> >>>>processes. It looks like something causes an instance of apan.sh to
> >>>>hang, and then they just start piling up (fast).
> >>>>
> >>>>The load is usually under 1.0 (sometimes jumping up to 1.xx - no big
> >>>>deal). When it died, my load was over 80 (yes eighty) with 46 (maybe
> >>>>more) *active* apan processes (not sure of the actual count, top dump
> >>>>only shows 62 lines of processes. It said 73 running, though, so likely
> >>>>more were apan.sh - also, unknown count of inactive apan.sh process
> >>>>sitting and waiting), 17 zombies (unknown parent, alas). 99% CPU usage
> >>>>on CPU0, 100% on CPU1. Yikes. This jump happened over 16 minutes, at
> >>>>which point my crons no longer ran, so who knows how badly it kept
> >>>>piling up.
> >>>>
> >>>>apan.debug log file doesn't show anything abnormal (whee.)
> >>>>
> >>>>I'm going to have to write a watcher to manually kill the hanging
> >>>>apan.sh procs, which I don't want to do for fear of inadvertently
> >>>>killing valid processes, but I am quite sick of having to go over to the
> >>>>colo to poke the power button once a week (only been in production 3
> >>>>weeks - 4 crashes so far).
> >>>>
> >>>>I'm going to increase my level of manual debugging, too, of processes,
> >>>>etc. I'll post any new insight.
> >>>>
> >>>>--jeff
> >>>>
> >>>>On Wed, 2003-10-08 at 10:31, Matthew Wilson wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>UPDATE: I have checked and my nagios installation does not have ePN compiled
> >>>>>in. So this is not the cause. I would greatly appreciate any suggestions
> >>>>>on how to prevent/cure this problem.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>Thanks
> >>>>>>Matthew Wilson.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>Matthew Wilson wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>Hi guys,
> >>>>>>>>I have read in the list archives in the last couple of months a few
> >>>>>>>>threads about nagios and apan chewing up memory. I have tried a few
> >>>>>>>>of the solutions posted but still have no joy.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>-------------------------------------------------------
> >>>>>This SF.net email is sponsored by: SF.net Giveback Program.
> >>>>>SourceForge.net hosts over 70,000 Open Source Projects.
> >>>>>See the people who have HELPED US provide better services:
> >>>>>Click here: http://sourceforge.net/supporters.php
> >>>>>_______________________________________________
> >>>>>Nagios-users mailing list
> >>>>>Nagios-users at lists.sourceforge.net
> >>>>>https://lists.sourceforge.net/lists/listinfo/nagios-users
> >>>>>::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> >>>>>::: Messages without supporting info will risk being sent to /dev/null
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>-------------------------------------------------------
> >>>>This SF.net email is sponsored by: SF.net Giveback Program.
> >>>>SourceForge.net hosts over 70,000 Open Source Projects.
> >>>>See the people who have HELPED US provide better services:
> >>>>Click here: http://sourceforge.net/supporters.php
> >>>>_______________________________________________
> >>>>Apan-users mailing list
> >>>>Apan-users at lists.sourceforge.net
> >>>>https://lists.sourceforge.net/lists/listinfo/apan-users
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>-------------------------------------------------------
> >>>This SF.net email is sponsored by: SF.net Giveback Program.
> >>>SourceForge.net hosts over 70,000 Open Source Projects.
> >>>See the people who have HELPED US provide better services:
> >>>Click here: http://sourceforge.net/supporters.php
> >>>_______________________________________________
> >>>Apan-users mailing list
> >>>Apan-users at lists.sourceforge.net
> >>>https://lists.sourceforge.net/lists/listinfo/apan-users
> >>>
> >>>
> >>
> >>
> >>
> >>
> >>-------------------------------------------------------
> >>This SF.net email is sponsored by: SF.net Giveback Program.
> >>SourceForge.net hosts over 70,000 Open Source Projects.
> >>See the people who have HELPED US provide better services:
> >>Click here: http://sourceforge.net/supporters.php
> >>_______________________________________________
> >>Nagios-users mailing list
> >>Nagios-users at lists.sourceforge.net
> >>https://lists.sourceforge.net/lists/listinfo/nagios-users
> >>::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> >>::: Messages without supporting info will risk being sent to /dev/null
> >>
> >--
> >********************************
> >
> >Igor Kurtovic
> >Technische Systemlösungen
> >QSC AG
> >
> >Phone: +49 221 6698 404
> >Mobile: +49 163 6698 075
> >Fax: +49 221 6698 469
> >WWW: www.q-dsl.de
> >Email: igor.kurtovic at qsc.de
> >
> >********************************
> >
> >
> >
> >
>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: SF.net Giveback Program.
> SourceForge.net hosts over 70,000 Open Source Projects.
> See the people who have HELPED US provide better services:
> Click here: http://sourceforge.net/supporters.php
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list