Monitoring tool for a large enterprice? Is Nagios suitable to any degree?
Subhendu Ghosh
sghosh at sghosh.org
Thu Jun 2 15:37:42 CEST 2005
On Thu, 2 Jun 2005, Ralf Strandell wrote:
> Hi,
>
> I'm new to Nagios and currently evaluating it's suitability for my
> professional needs.
>
> I have searched the internet for monitoring tools, but almost everything
> I can find seems to belong to the "simple yet powerful" category. I need
> something better. I have been using a heavily extended Big Brother
> monitoring system and it is not flexible and powerfull enough. Nagios
> might do. I don't know. I don't know how well Nagios works with MIBs and
> SNMP traps or whether it supports compound events etc. The documentation
> is extensive, but it's hard to find the relevant information.
>
> Thus I need to ask you. Sorry for a long email...
>
> What I need to monitor:
> ------------------------------------------------------ 1) NETWORK I need
> to monitor hundreds of Juniper/Cisco/Microsoft/other devices including
> routers, switches, firewalls, vpn gateways, dsl/isdn, dns-servers,
> dhcp-servers and uninterruptible power supplies using SNMP.
>
> I need to know about connectivity, reboots, uptime, cpu load, memory,
> bandwidth utilization (octets/time and % of max), traffic distribution
> *changes* (by protocol, by port), up/down interfaces and VPN tunnels,
> routing *changes* and alarms (traps) and UPS battery and electricity
> status.
>
> 2) HOSTS I need to monitor server parameters including connectivity,
> reboots, uptime, cpu load, memory, swapping, disk I/O, diskspace and
> services
>
> 3) APPLICATIONS Including databases (deadlocks, logs, free space...) and
> everything one could find in a modern big enterprises data center.
>
> 4) SERVICE LEVEL I also want to monitor the point-to-point bandwidth and
> response time (ping roundtrip, http response, general tcp connect,
> database connections).
>
> 5) BUSINESS PROCESSES (ABSTRACTION LAYER) I want some basic root cause
> analysis capability (ie. unreachable vs. down) and an abstraction layer
> between polls/traps and alerts. I want to define compound events that
> happen when several events coincide. Examples: Disk is more than 90%
> full more than 10 minutes. Primary network connection has been lost for
> more than 10 minutes or both the primary and backup connections have
> been down more than three minutes. Event A and B happened, but not C,
> and all this has lasted for more than 10 minutes and it is not sunday
> between 1am and 2am. These rules/scripts/compound events are important
> for my monitoring needs. I need to monitor a big enterprise with several
> data centers, complicated network topology and business systems
> comprising of several servers working together.
> ----------------------------------------------------------------
>
> Plus...
>
> All this collected or deducted data should be stored in an event
> database and used for history reports, snapshot reports, service level
> reports, trends/graphics,... everything. Naturally it needs a web user
> interface with at least two user levels (admin, monitor) and several
> views (network view, business view...). Flexibility and manageability
> are more important than instant ease of use.
>
> These requirements rule out about 100% of the monitoring tools I have
> found. Please help. I'm lost.
>
> This would be used as a professional monitoring tool. It's a day job.
> Usually 5 x 8hrs, so I don't need anything "simple yet powerfull". It
> can also cost a bit. It can be hard to learn - no problem. So, do I have
> any other choice than HP OpenView or Tivoli Enterprise Console? What can
> nagios do for me?
>
>
>
>
Even with HPOV you will need additional tools sets to accomplish all the
level of details you are looking for. For most of these categories there
are best-of-breed application.
For host and network fault monitoring - Nagios
For link traffic/app response times - MRTG/Cricket/Cacti with alerts to
Nagios
For link ping response times - SmokePing
For protocol distribution - Netflow from routers/ntop and FlowScan
For routing changes - custom plugin that looks at nexthop for specific
routes
For router config management - RANCID - could probably define an
alert/trap to feed into Nagios
For business process abstraction - service dependencies in Nagios is great
- it is pretty close to HPOV's service manager in design functions
You will probabaly want to write you own web interface to integrate the
various data sets.
--
-sg
-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list