Incredible amount of false positives
DTerrell at Delphi-Tech.com
DTerrell at Delphi-Tech.com
Mon Dec 8 20:55:01 CET 2003
Thanks for all the suggestions, I will try these. Moving the central to the
dual proc was my next step, it will probably go into effect after I can get
the nagios machine running in NJ (I'm in MA) back up again.
I've turned off all logging
reaper frequency was set to 5 on distrib, changed to 2
reaper frequency was set to 3 on central, changed to 2
concurrent checks was set to 5 on both, changed to 3 (recommended by nagios
-s nagios.cfg on both)
aggregate_status_updates was set to 1 on both
status_update_interval was set to 120 on distrib, left at 120
status_update_interval was set to 5 on central, changed to 60
Lowered pings to 1, all commands are defaults pretty much (in checkcommands
using standard modules in libexec).
We'll see how it goes, thanks
> ________________________
> David A. Terrell
> MIS Engineer, RHCE, A+
> Delphi Technology, Inc.
> Cambridge, MA 02139
> 617-494-8361 x2024
>
>
-----Original Message-----
From: Marc Powell [mailto:marc at ena.com]
Sent: Monday, December 08, 2003 1:42 PM
To: DTerrell at Delphi-Tech.com; nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] Incredible amount of false positives
> -----Original Message-----
> From: DTerrell at Delphi-Tech.com [mailto:DTerrell at Delphi-Tech.com]
> Sent: Monday, December 08, 2003 12:16 PM
> To: nagios-users at lists.sourceforge.net
> Subject: RE: [Nagios-users] Incredible amount of false positives
>
> I personally suspect the machine is just over taxed, although the
load
> average is usually 0.60-0.80. The ramdisk is used because in the past
> after
> about 10 minutes the machine would begin to get bogged down and more
and
> more nagios procs would start to spawn. I've had hundreds of nagios
procs
> at once on that machine which caused a spiraling effect where more
would
> spawn cauing more resources to be used causing the other procs to be
held
> back and not respond and thus more procs were spawned. All networks
here
> are 100mbps full. Plugin timeouts/dest unreachable/you name it, 10's
of
> times I've gone straight onto the box and run the very plugin that was
> complaining with perfectly fine results. There's only one copy of
nagios
> running...is 233/128M too little for this job? What else can I show
to
> help
> the process?
Well, to be honest, my first reaction was that I was quite surprised
that a machine of that caliber was being used for such an important
task. That's not to say that it isn't up to the job given the few hosts
and services that you have configured. It may just be that you need to
do some nagios performance tuning to work better with that hardware.
I've had a PIII 800 with 512Meg ram running Nagios, Smokeping and
Cricket (and other things) for >800 services each with 5 minute check
intervals successfully. The more the RAM the better generally as well.
It's cheap sp there's really no excuse unless you just can't find the
type you need anymore.
Again, since you didn't provide much specific information about your
configuration or any work that you've done so far some of these
suggestions might be redundant --
- Schedule service checks to at least 5 minute intervals. If you're
checking more often than this ask yourself if it's really necessary.
- nagios.cfg -
o use_syslog=0
o log_* = 0 except for those specific things you really care
about
o service_reaper_frequency=2
o max_concurrent_checks=xxx (run ~nagios/bin/nagios -s
~nagios/etc/nagios.cfg for the number to put here)
o aggregate_status_updates=1
o status_update_interval=60 (or higher, depending on how fresh
you want the web data)
o command_check_interval=-1 (on your central host)
- Make sure your host check commands are as simple as possible and
complete as quickly as possible. Why ping 5 or 10 times when a single
ping will do.
- When sending service checks to a central server via OSCP/NSCA be aware
that Nagios will keep a child open until that process completes (and I
believe it doesn't schedule a new check of that service either). If
there is a delay in the transmission process or other problems it can
severely affect the performance of the polling server.
- If it were me I'd make the dual proc machine the central server but
you may have reasons for not doing so.
--
Marc
-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list