Incredible amount of false positives

DTerrell at Delphi-Tech.com DTerrell at Delphi-Tech.com
Mon Dec 8 19:16:08 CET 2003


I personally suspect the machine is  just over taxed, although the load
average is usually 0.60-0.80.  The ramdisk is used because in the past after
about 10 minutes the machine would begin to get bogged down and more and
more nagios procs would start to spawn.  I've had hundreds of nagios procs
at once on that machine which caused a spiraling effect where more would
spawn cauing more resources to be used causing the other procs to be held
back and not respond and thus more procs were spawned.  All networks here
are 100mbps full.  Plugin timeouts/dest unreachable/you name it, 10's of
times I've gone straight onto the box and run the very plugin that was
complaining with perfectly fine results.  There's only one copy of nagios
running...is 233/128M too little for this job?  What else can I show to help
the process?

Thanks,

> ________________________ 
> David A. Terrell
> MIS Engineer, RHCE, A+ 
> Delphi Technology, Inc. 
> Cambridge, MA 02139
> 617-494-8361 x2024
> 
> 


-----Original Message-----
From: Marc Powell [mailto:marc at ena.com]
Sent: Monday, December 08, 2003 12:14 PM
To: DTerrell at Delphi-Tech.com; nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] Incredible amount of false positives


Since we don't have any configuration information to go on, it's going
to be difficult to troubleshoot possible config issues. Looking at the
machines themselves, what is load like on them? Are they doing other
work besides Nagios (specifically the 233 Mhz machine)? It's been my
experience that if load is high for a ping check for example the check
executes and even if the ping returns in 10ms, if the plugin has to wait
1000ms for CPU time, your ping response is going to be 1000ms (because
the plugin is just now able to process the response). Do you find that
one machine is generating more false positives than another? What
exactly is the false positive output of the plugin (plugin timeout? Host
unreachable? Response above critical threshold?, etc...) How about your
network? Do you have speed and duplex hardcoded on all the machines and
the switch(es) they are connected to? Speed/Duplex mismatch can be the
cause a quite a number of odd issues. Do you maybe have multiple copies
of nagios running on a box? This is less likely as a cause but some of
your indications below hint at it.

With just 53 checks I would think that using a ramdisk for storage
wouldn't make any difference performance wise and may actually be
hurting you, expecially the machines with just 128 megs of ram. You're
likely forcing the use of more (much slower) swap space than you
normally would. 

If none of the above help you then the more, and more specific
information you can provide will help us help you.

> -----Original Message-----
> From: DTerrell at Delphi-Tech.com [mailto:DTerrell at Delphi-Tech.com]
> Sent: Monday, December 08, 2003 10:30 AM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] Incredible amount of false positives
> 
> This is getting quite frustrating.  I'm wondering why my nagios
machines
> are
> sending out an incredible amount of false positives to me.  The
> configuration is two distributed machines, one is a 233mhz/128M/256M
swap
> with 42 hosts, among those 42 hosts is 53 service checks, most of
which
> being just ping.  The other is a dual 667/1G/2G swap with 49 hosts/52
> service checks, the central machine is a 667mhz/128M/256M swap.  Often
> what
> I get in e-mail is either one email with critical and the recovery
shortly
> after, or even both at the same time!  All servers run the nagios
status
> fs
> in a ramdisk as suggested long ago.  Any suggestions on how to get
more
> accurate notifications, or even accurate to begin with (99.9% of the
time
> it's not legit) would be appreciated.
> 
> Thanks,
> 
> > ________________________
> > David A. Terrell
> > MIS Engineer, RHCE, A+
> > Delphi Technology, Inc.
> > Cambridge, MA 02139
> > 617-494-8361 x2024
> >
> >
> >
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: SF.net Giveback Program.
> Does SourceForge.net help you be more productive?  Does it
> help you create better code?  SHARE THE LOVE, and help us help
> YOU!  Click Here: http://sourceforge.net/donate/
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list