Large scale network monitoring limits with nagios
Noah Leaman
noah at mac.com
Thu Mar 11 13:52:00 CET 2004
Hopes it's o.k. cross posting to both groups on this matter...
Using the concept of one service per up/down trap for each network
interface, I tested a little by creating a very simple set of nagios
configs, but with about 8000 PASSIVE service checks and no active
service checks. of course there was no problem in terms of scheduling
issues, but the CGIs all crawled to a snails pace. In my setup (nagios
1.2, Dual G4 first-gen xServe) it takes about 30 secs to display the
Status Summary page.
Of course that config setup isn't the actual production plan...
I enabled the closer to real-world configs:
552 check_traffic (2 snmpgets running every 10 minutes per service
check storing to an RRD)
295 check_ping (number of locally monitored hosts)
8389 check_dummy (mostly the up/down Trap and about 100 are passive
services coming from 2 other distributed nagios servers doing pings and
check_traffics)
... So 9236 services all together but this is really just a small
subset of what I would like to be able to do. The plan is to through
hardware at it to spread out the real work being done (i.e. the active
checks).
But with just this setup, a single CGI take up an entire CPU to run and
for a few minutes a lot of the time... and the plan was to have a good
handful of GUI users (5 ish at a time)... it's just about unusable with
one GUI user.
How to monitor traps for hundreds of network hosts and tens of
thousands different interfaces each of which could generate up/down
traps along with other traps. I tried setting up a single "catch-all"
trap service per host, but notification would need to occur when going
from and OK to another OK (with a different output). Shouldn't this
work with is_volatile on and stalking_options set to o,w,u,c (every
test I've done to get this working from OK to OK doesn't work... but
maybe I missed something).
So the higher level question here is am I over my head in what or how I
can do this with nagios? After tackling the network monitoring needs,
the plan was to then start the server monitoring (around 1000 servers
of many platforms).
Any helpful guidance?
--
Noah
On Wednesday, March 10, 2004, at 06:51 PM, Noah Leaman wrote:
> I have over 70,000 interfaces/ports (just the up/up ones) for which I
> could receive linkDown and linkUp traps for. And this is just a
> sampling of hosts on our network to pilot nagios to see if it can do
> what we want. Doesn't it seem a little crazy to have to deal with that
> many services even if they are passive? And this is just linkDown and
> linkUp. What about all other possible traps that could be received?
>
> --
> Noah
>
>
> On Friday, March 5, 2004, at 01:15 AM, Jim Mozley wrote:
>
>> Noah Leaman wrote:
>>
>>> How do you all address the issue of trap monitoring when you want
>>> notifications for them?
>>
>> I have done something similar with interfaces, the only way I know is
>> to define each interface as a service. I realise this is potentially
>> a lot of services. We do this on core network device interfaces, but
>> only define services for interfaces that are in use. This is an
>> automated process so as interfaces are activated/deactivated they are
>> added or removed from the Nagios configuration files. As the only
>> alerts are passive ones for these services, it isn't as though one is
>> introducing something like a vast increase in active checks.
>>
>> HTH,
>>
>> Jim Mozley
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by: IBM Linux Tutorials
>> Free Linux tutorial presented by Daniel Robbins, President and CEO of
>> GenToo technologies. Learn everything from fundamentals to system
>> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when
>> reporting any issue. ::: Messages without supporting info will risk
>> being sent to /dev/null
>>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue. ::: Messages without supporting info will risk
> being sent to /dev/null
>
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
More information about the Developers
mailing list