How many hosts and services are you monitoring with Nagios?
James Whittington
James.Whittington at vc3.com
Fri May 18 20:36:45 CEST 2012
We're monitoring around 1000 hosts and 4700 services.
We are using the last version of Opsview Community like Simon although his setup sounds a bit more fault tolerant.
We have 1 master server with about 30 slave servers monitoring various remote sites.
Easy of distributed setup was what won us over to Opsview several years ago but as they moved the distributed version to the enterprise commercial edition I am starting to pay attention again to the different variants of Nagios out there.
Also centralized web based configuration front end was another huge plus as engineers don't have to understand Nagios to setup hosts.
The racoon setup sounds like some good stuff.
James
-----Original Message-----
From: Simone Felici [mailto:s.felici at mclink.eu]
Sent: Friday, May 18, 2012 3:33 AM
To: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] How many hosts and services are you monitoring with Nagios?
Impressive :)
We're monitoring ~2000 hosts and ~10000 services, every 5 minutes.
Architecture used: OPSView Community edition, the last free version before it started to make the distributed version commercial :/ Two central servers (active/standby - drbd) as single point for management and collecting all passive checks executed by the slave servers. Performance data saved into rrd files as well on an external BIG database server. Configuration resides on a cluster MySQL installation (drbd).
4 slave "datacenter installations" with 2 servers per "datacenter" in active/active load balancing.
Traps handling supported on all servers with rules logic.
Pros:
- Open Source: at least until version 3 - for our setup. Simple single instance with fewer functions available as well on version 4.
- Easy to manage: the prupose was to create monitoring system and then let the management to other people with less technical skills
- distributed setup
- RBAC
Disadvantages:
- no longer Open Source: see above
- Central server suffering on cpu by GUI implementation and other bg jobs
- Not all nagios parameters editable as we like: i.e. cannot customize same checks with different intervals without having to re-create new ones. Think on HTTP service on servers with different loads and the need to extend the retries on high load servers. no way expect creating "HTTP" and "HTTP High Load" services.
Maybe there are more pros (and disadvantages), but it's not the right place.
BTW I'll look forward to wait for this solution; seems interesting!
Simon
Il 17/05/2012 16:43, Max Schubert ha scritto:
> Hi,
>
> I like it when people periodically post numbers and architecture
> summaries, I am guessing with the distributed frameworks out now for
> Nagios this thread might be seeing bigger numbers than past threads
> have.
>
> With our custom-built distributed Nagios-based monitoring system, we
> are currently monitoring 18000+ hosts every 5 minutes and 100k+ active
> services (plenty of passive services in addition to the actives) every
> 5 mins as well. We collect performance data from every check as well
> and pass that on to a highly distributed and scalabe time-series data
> warehouse another team in our organization has built (which is why we
> have the 5 min interval requirement)
>
> We also do trap ingest using SNMPTT with a few custom mods, but not
> going to include those numbers as they never have required the
> optimizations the polling has required.
>
> This isn't a monolithic instance, we have 6 projects using instances
> of our distributed Nagios-based software, called Racon (soon my
> manager will give our team to package it as open source - so I hear at
> least). We built it on core Nagios with a custom database layer based
> on a very very early version of Merlin's database abstraction layer
> (thank you Andreas!) - we have a custom client/server network-based
> notification framework in use (we will release that as well) along
> with a custom NEB/perl based client-server framework (also releasable,
> just need time scheduled) for sending and processing performance data
> - the performance and notification framework are both horizontally
> scalabe and network fault tolerant.
>
> What kinds of numbers of hosts and services are you all monitoring?
> Which add-ons / distributed frameworks are you using?
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list