realistic system requirements and capacity
Bishop, Dean
dean.bishop at tcdsb.org
Wed Sep 18 21:33:09 CEST 2002
As a sidenote i have found a limit to Nagios.
After making a ton of minor changes to the config files hosts.cfg,
services.cfg, and hostgroups.cfg (basically i grouped better and
alphabetized the files for ease of use) as well as adding a few extra hosts,
i experienced serious performance problems. i have about 1250 hosts, one
service per, and once i had made these changes ping checks were taking up to
10 seconds and then the check-host-alive and check_tcp!23 were simply timing
out on a varying number of devices. Different devices each time.
i began to imagine that Nagios didn't like the way i organized my config
files or something and anticipated (or despaired) having to redo my files.
As we have been experiencing network delays due to other problems i set out
to find the culprit. i enable a few sniffers and started investigating.
Alas, the network looked fine. i changed where my Nagios box was
connected...no luck.
We were going through our firewall dealing with a different issue and i
asked, purely out of curiosity, whether the Nagios box traffic was being
logged. Indeed it was, as successful connections. All is well. "Should be
every 15 minutes right?" i half enquired and half stated. "No."
"Pardon?"
"No. They are 2 to 5 minutes apart."
"What!!??"
apparently, while making my "minor changes" i inadvertantly deleted the "5"
from "15" from the "normal_check_interval" parameter in the object template
that is used for all 1250 service checks.
Whoops!
Moral of the story? It actually did quite well, all things considered,
monitoring 1250 devices at 1 minute intervals. It hovered at 20-40
timed-out services. This on a P-II 350 w/384M of RAM.
regards,
dean
the [slightly] more attentive typer.
-----Original Message-----
From: Marc Powell [mailto:mpowell at ena.com]
Sent: Wednesday, September 18, 2002 2:58 PM
To: Nagios-Users (E-mail)
Subject: RE: [Nagios-users] realistic system requirements and capacity
I'm utilizing a distributed setup and my most heavily loaded data
collector is monitoring PING service on 592 hosts. It easily completes
all the checks within 5 minutes with minimal load on the server. That
machines is also running Smokeping and Cricket for the same number of
hosts. My data collector machines are PIII 800's with 512MB Ram. Another
data collector which is doing significantly more varied service checks
(ping, http, ntp, dns, smtp, practically the whole gamut along with
several custom plugins we use; 440 services on 206 hosts) is exhibiting
the same performance. Average check execution time is 4.389 seconds and
average check latency is 0.226 seconds. This machine is again also
running Cricket and Smokeping for the same hosts. Not even breaking a
sweat. I feel that I can scale this up 2x or higher on the same
hardware. My only challenge right now is the 4K size of the named pipe
on my central Nagios server and the fact that Nagios doesn't check it
often enough, even with command_check_interval=-1. I'll have periods
when I have 2000+ nsca processes waiting to write to the pipe.
--
Marc
> -----Original Message-----
> From: Russell Adams [mailto:RLAdams at Kelsey-Seybold.com]
> Sent: Wednesday, September 18, 2002 12:34 PM
> To: Nagios-Users (E-mail)
> Subject: Re: [Nagios-users] realistic system requirements and capacity
>
> I ran Netsaint 0.0.4b on a P90 w/ 64MB for many years, until the load
> levels reached about 300-500% monitoring 120 hosts and 200
> services. Netsaint was still single-threaded, and at this load level
> took about 10-12 minutes to complete each run of service checks...
>
> I recently upgraded to Netsaint 0.0.7 on a PII/450 w/ 128MB which is
> monitoring 230 hosts and 350 services and its now at about 200%
> load. The parallel checks mean that my service checks are running
> within a fraction of a second of when the scheduler says they should,
> so there's little latency. However the execution time varies by a few
> seconds depending on the load.
>
> I'm planning on getting a Ghz machine w/ 256MB next to upgrade to
> Nagios. Even though my server load could increase to monitoring about
> 300+ hosts and 500 services, I think I could handle that on a single
> system still. After that, I would consider doing the distributed
> monitoring. Depends on the number and type of services being
> checked. A ping check takes very little time and resources, and a
> single machine could handle thousands I'm sure. However I'm monitoring
> processes, disks, and more via SNMP, and some other custom shell
> scripts, at about 5-6 services per host. These really load down the
> system, when compared to my 130 ping only checks.
>
> From my experience, disk space is nearly irrelevant, CPU does all the
> work and some memory can speed it up. I've also heard good things
> about putting your plugins into a RAM disk, but if the machine does
> nothing but Netsaint then I'd think the cache is just as effective
> when there's enough memory available.
>
> Perhaps this historic data can put your machine selection into
> perspective.
>
> Russell Adams
> Systems Administrator
> Kelsey Seybold Clinics
>
> On Wed, Sep 18, 2002 at 08:59:17AM -0700, George Miscioscia wrote:
> > To any and all experienced nagios users, what system requirements
would
> you
> > recommend to run a Nagios install at full capacity, and what would
you
> > consider full capacity before impementing distributed servers? I.E.,
how
> > many checks would you recommend one server perform?
> >
> > thanks,
> >
> > George Miscioscia
> > Manager, Internet Systems
> > Ticketmaster/Citysearch
> > office (213)739-3521
> > cell (310)902-6743
> >
> >
> >
> > -------------------------------------------------------
> > This SF.NET email is sponsored by: AMD - Your access to the experts
> > on Hammer Technology! Open Source & Linux Developers, register now
> > for the AMD Developer Symposium. Code: EX8664
> > http://www.developwithamd.com/developerlab
> > _______________________________________________
> > Nagios-users mailing list
> > Nagios-users at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
>
>
> -------------------------------------------------------
> This SF.NET email is sponsored by: AMD - Your access to the experts
> on Hammer Technology! Open Source & Linux Developers, register now
> for the AMD Developer Symposium. Code: EX8664
> http://www.developwithamd.com/developerlab
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
-------------------------------------------------------
This SF.NET email is sponsored by: AMD - Your access to the experts
on Hammer Technology! Open Source & Linux Developers, register now
for the AMD Developer Symposium. Code: EX8664
http://www.developwithamd.com/developerlab
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
-------------------------------------------------------
This SF.NET email is sponsored by: AMD - Your access to the experts
on Hammer Technology! Open Source & Linux Developers, register now
for the AMD Developer Symposium. Code: EX8664
http://www.developwithamd.com/developerlab
More information about the Users
mailing list