[Apan-users] Re: nagios and apan cause server to crash...
Fredrik Wänglund
fredrik.wanglund at datavis.se
Tue Oct 14 16:03:12 CEST 2003
Is this a RH9 problem? Or is it related to a secific version of the
kernel, bash, ... ?
What other OS-versions have this problem?
/FredrikW
Igor Kurtovic wrote:
> step back to RH 8.0 ..
>
> i had similar probs, the only difference was a daily crash :P
>
> even with changed reaper-frequency there was no improvement to see.
> after getting it back on RH 8.0 all is fine again.
>
> 300 hosts
> 1500 services
> 400 apan's
> 150 mrtg-hosts
>
> all on this box:
>
> Dual Xeon III 1 Ghz
> 2 GB RAM
>
> never had any perfomrance issues or stability probs b4 going onto RH 9.0
>
> Regards, Igor
>
>
>
> On Tue, 2003-10-14 at 09:25, Fredrik Wänglund wrote:
>
>>I have service_reaper_frequency=3, and I remember that before I changed
>>it from the default, my load used to be 8-10.
>>
>>/FredrikW
>>
>>Evan Weston wrote:
>>
>>>I was having a simmilar problem under Redhat 9 on a pIII 900 512 meg ram.
>>>
>>>I set 'service_reaper_frequency=4' instead of the default 'service_reaper_frequency=10' in the 'nagios.cfg' file and its completely stable now.
>>>
>>>Evan Weston
>>>
>>>
>>>-----Original Message-----
>>>From: Fredrik Wänglund [mailto:fredrik.wanglund at datavis.se]
>>>Sent: Tuesday, 14 October 2003 4:21 PM
>>>To: jeff vier
>>>Cc: Matthew Wilson; nagios-users; Apan-users List
>>>Subject: Re: [Apan-users] Re: [Nagios-users] nagios and apan cause server to crash...
>>>
>>>What platform/version are you running on?
>>>
>>>I'm running without any problem under RedHat 8.0 on a PIII 1400MHz with
>>>170 hosts, 200 apan-services and 300 'normal' services.
>>>My system-load stays between 1 and 2, CPU is mainly >80% idle
>>>
>>>jeff vier wrote:
>>>
>>>
>>>
>>>>I'm having the same problem here.
>>>>
>>>>I have been capturing dumps of the top command, pulling only active
>>>>processes. It looks like something causes an instance of apan.sh to
>>>>hang, and then they just start piling up (fast).
>>>>
>>>>The load is usually under 1.0 (sometimes jumping up to 1.xx - no big
>>>>deal). When it died, my load was over 80 (yes eighty) with 46 (maybe
>>>>more) *active* apan processes (not sure of the actual count, top dump
>>>>only shows 62 lines of processes. It said 73 running, though, so likely
>>>>more were apan.sh - also, unknown count of inactive apan.sh process
>>>>sitting and waiting), 17 zombies (unknown parent, alas). 99% CPU usage
>>>>on CPU0, 100% on CPU1. Yikes. This jump happened over 16 minutes, at
>>>>which point my crons no longer ran, so who knows how badly it kept
>>>>piling up.
>>>>
>>>>apan.debug log file doesn't show anything abnormal (whee.)
>>>>
>>>>I'm going to have to write a watcher to manually kill the hanging
>>>>apan.sh procs, which I don't want to do for fear of inadvertently
>>>>killing valid processes, but I am quite sick of having to go over to the
>>>>colo to poke the power button once a week (only been in production 3
>>>>weeks - 4 crashes so far).
>>>>
>>>>I'm going to increase my level of manual debugging, too, of processes,
>>>>etc. I'll post any new insight.
>>>>
>>>>--jeff
>>>>
>>>>On Wed, 2003-10-08 at 10:31, Matthew Wilson wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>UPDATE: I have checked and my nagios installation does not have ePN compiled
>>>>>in. So this is not the cause. I would greatly appreciate any suggestions
>>>>>on how to prevent/cure this problem.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Thanks
>>>>>>Matthew Wilson.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Matthew Wilson wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Hi guys,
>>>>>>>>I have read in the list archives in the last couple of months a few
>>>>>>>>threads about nagios and apan chewing up memory. I have tried a few
>>>>>>>>of the solutions posted but still have no joy.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>-------------------------------------------------------
>>>>>This SF.net email is sponsored by: SF.net Giveback Program.
>>>>>SourceForge.net hosts over 70,000 Open Source Projects.
>>>>>See the people who have HELPED US provide better services:
>>>>>Click here: http://sourceforge.net/supporters.php
>>>>>_______________________________________________
>>>>>Nagios-users mailing list
>>>>>Nagios-users at lists.sourceforge.net
>>>>>https://lists.sourceforge.net/lists/listinfo/nagios-users
>>>>>::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
>>>>>::: Messages without supporting info will risk being sent to /dev/null
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>-------------------------------------------------------
>>>>This SF.net email is sponsored by: SF.net Giveback Program.
>>>>SourceForge.net hosts over 70,000 Open Source Projects.
>>>>See the people who have HELPED US provide better services:
>>>>Click here: http://sourceforge.net/supporters.php
>>>>_______________________________________________
>>>>Apan-users mailing list
>>>>Apan-users at lists.sourceforge.net
>>>>https://lists.sourceforge.net/lists/listinfo/apan-users
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>-------------------------------------------------------
>>>This SF.net email is sponsored by: SF.net Giveback Program.
>>>SourceForge.net hosts over 70,000 Open Source Projects.
>>>See the people who have HELPED US provide better services:
>>>Click here: http://sourceforge.net/supporters.php
>>>_______________________________________________
>>>Apan-users mailing list
>>>Apan-users at lists.sourceforge.net
>>>https://lists.sourceforge.net/lists/listinfo/apan-users
>>>
>>>
>>
>>
>>
>>
>>-------------------------------------------------------
>>This SF.net email is sponsored by: SF.net Giveback Program.
>>SourceForge.net hosts over 70,000 Open Source Projects.
>>See the people who have HELPED US provide better services:
>>Click here: http://sourceforge.net/supporters.php
>>_______________________________________________
>>Nagios-users mailing list
>>Nagios-users at lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/nagios-users
>>::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
>>::: Messages without supporting info will risk being sent to /dev/null
>>
>--
>********************************
>
>Igor Kurtovic
>Technische Systemlösungen
>QSC AG
>
>Phone: +49 221 6698 404
>Mobile: +49 163 6698 075
>Fax: +49 221 6698 469
>WWW: www.q-dsl.de
>Email: igor.kurtovic at qsc.de
>
>********************************
>
>
>
>
-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list