Bug in Nagios orphan-check?!

Carroll, Jim P [Contractor] jcarro10 at sprintspectrum.com
Wed Feb 5 23:25:54 CET 2003


It *does* seem that you're running out of memory.  Just a guess.

You haven't mentioned how much RAM or swap you have on this machine.  80
nagios processes isn't much, considering I've had quite a bit more than that
in the past.  Granted, I'm also running with 1 GB of RAM and 2 GB of swap.

You might want to consider adding more RAM and bump up your swap space.

jc

> -----Original Message-----
> From: Matthias Eichler [mailto:me at ame.de]
> Sent: Wednesday, February 05, 2003 4:54 AM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] Bug in Nagios orphan-check?!
> 
> 
> Hi List,
> 
> I have some Nagios 1.0 installed on a Debian 3 Woody. The machine
> is some Intel Celeron 700 MHz with 256 MB of RAM.
> 
> The setup was doing really well for some long time, but now I get
> some severe problems more or less every five days.
> 
> Today we had some connection problems to some remote farm. But Nagios
> didnt send out host-down notifications, it said this in its event log:
> 
> "Warning: The check of service 'blabla' on host 'blabla' looks like
> it was orphaned (results never came back). I'm scheduling an immediate
> check of the service..."
> Since this first entry Nagios reported this warning with EVERY service
> check, about 142 times...
> At this time I tried to get on the web interface and got no connect,
> the SSH login took very long, what I am not wondering about, 
> because the
> box had a load of 7.83!
> I saw that there were about 80 nagios processes in the list. They were
> not stopped by some /etc/init.d/nagios stop, I had to kill them all.
> 
> In dmesg I see entries like:
> ---
> Feb  5 11:12:14 ozzy kernel: Out of Memory: Killed process 18026
> (apache).
> Feb  5 11:12:20 ozzy kernel: Out of Memory: Killed process 18022
> (apache).
> ---
> or
> ---
> Feb  5 11:21:48 ozzy kernel: request_module[net-pf-10]:
> waitpid(13304,...) failed, errno 512
> Feb  5 11:21:48 ozzy kernel: request_module[net-pf-10]:
> waitpid(13305,...) failed, errno 512
> Feb  5 11:21:48 ozzy kernel: request_module[net-pf-10]:
> waitpid(13306,...) failed, errno 512
> ---
> 
> I think there might be some bug, because also the remote
> site is not available Nagios should warn us about it and
> not confuse the box like this...?!?
> 
> Any ideas?!? 
> 
> Greetings from Munich,
> 
> Matthias
> 
> -- 
> 
> Mit freundlichen Grüßen
> AME Aigner Media & Entertainment GmbH
> 
> 
> Matthias Eichler
> Leiter Technik | Technical Director
> _______________________________________
> 
> AME® Aigner Media & Entertainment GmbH
> Bavariaring 8        D-80336 München
> 
> Tel [+49] Ø89.427 05 - 330
> Fax [+49] Ø89.427 05 - 400
> 
> http://ame.de        eMail: me at ame.de
> _______________________________________
> Angaben nach TDG|GmbHG:ame.de/impressum
> 


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com




More information about the Users mailing list