Passive monitoring is running slow?
Marc Powell
marc at ena.com
Wed May 2 23:38:55 CEST 2007
> -----Original Message-----
> From: nagios-users-bounces at lists.sourceforge.net [mailto:nagios-users-
> bounces at lists.sourceforge.net] On Behalf Of Jonathan Call
> Sent: Wednesday, May 02, 2007 10:07 AM
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Passive monitoring is running slow?
>
>
>
> > -----Original Message-----
> > From: Thomas Guyot-Sionnest [mailto:dermoth at aei.ca]
> > Sent: Tuesday, May 01, 2007 4:29 PM
> > To: Jonathan Call
> > Cc: nagios-users at lists.sourceforge.net
> > Subject: Re: [Nagios-users] Passive monitoring is running slow?
> >
> > On 01/05/07 05:15 PM, Jonathan Call wrote:
> > > I have set up a distributed monitoring system per the Nagios
> > documentation.
> > >
> > > I initially tested it out by having the distributed server monitor
> only
> > 24 or so services on about 8 hosts. There didn't seem to be any
> problems.
> > >
> > > I then cranked it up to 427 services on 81 hosts. I'm watching the
> > distributed server right now and there is hardly any system load but
> the
> > Service Check Latency seems extremely high:
> > >
> > > Metric Min. Max. Average
> > > Check Execution Time: 0.05 sec 1.67 sec 0.701
> sec
> > > Check Latency: 60.40 sec 287.36 sec 184.514
> sec
> > > Percent State Change: 0.00% 0.00% 0.00%
> > >
> > > This is resulting in 50% or less of the service checks completing
in
> the
> > 5 minutes or less timeframe.
> > >
> So this is a know design failure in Nagios then? I'm fairly new to
Absolutely not.
> Nagios and I am completely dumbfounded at this. If you can't service
> even a quarter (and probably even a tenth) of the amount of hosts and
> services on a distributed server than you can on a regular active
server
> then what is the point of having a distributed model at all?
I have 5 data collector machines running nagios
-and- cricket for thousands of services each with nagios reporting all
results back to two central hosts as documented. Average latency is
0.689 seconds and Max of 3.65 seconds right now. The distributed server
should be performing exactly like a regular active server as far as
latency stats are concerned. You're either starving nagios for resources
needed to run its active checks (run ~nagios/bin/nagios -s
~nagios/etc/nagios.cfg to see recommended settings) or, less likely,
something is wrong with your submit-check-result. If you submit a result
from the command line, does it complete in a timely manner? If you
disable OCSP does the latency go away? Basic troubleshooting dictates
you should try methodically enabling features on your distributed
machine to turn it from an active-only server to active submitting check
results via OCSP.
Disable OCSP program-wide (nagios.cfg)
Test
Enable OCSP but have your OCSP script do everything except call
send_nsca
Test
Enable send_nsca in your OCSP script.
Test
...
Do you have regular host checks enabled? Post the output of nagios -v
and nagios -s.
--
Marc
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list