Monitoring tool for a large enterprice? Is Nagios suitable to any degree?
Arno Lehmann
al at its-lehmann.de
Thu Jun 2 13:52:59 CEST 2005
Hello,
Ralf Strandell wrote:
> Hi,
>
> I'm new to Nagios and currently evaluating it's suitability for my professional needs.
Fine.
> What I need to monitor:
> ------------------------------------------------------
> 1) NETWORK
> I need to monitor hundreds of Juniper/Cisco/Microsoft/other devices including routers, switches, firewalls, vpn gateways, dsl/isdn, dns-servers, dhcp-servers and uninterruptible power supplies using SNMP.
Ok.
> I need to know about connectivity, reboots, uptime, cpu load, memory, bandwidth utilization (octets/time and % of max), traffic distribution *changes* (by protocol, by port), up/down interfaces and VPN tunnels, routing *changes* and alarms (traps) and UPS battery and electricity status.
Ok, as long as you plan to invest some efforts even in programming on
your own.
>
> 2) HOSTS
> I need to monitor server parameters including connectivity, reboots, uptime, cpu load, memory, swapping, disk I/O, diskspace and services
Ok.
>
> 3) APPLICATIONS
> Including databases (deadlocks, logs, free space...) and everything one could find in a modern big enterprises data center.
Ok, but this might need some programming by yourself, too.
>
> 4) SERVICE LEVEL
> I also want to monitor the point-to-point bandwidth and response time (ping roundtrip, http response, general tcp connect, database connections).
Ok, to my knowledge, although I never applied those.
>
> 5) BUSINESS PROCESSES (ABSTRACTION LAYER)
> I want some basic root cause analysis capability (ie. unreachable vs. down) and an abstraction layer between polls/traps and alerts. I want to define compound events that happen when several events coincide. Examples: Disk is more than 90% full more than 10 minutes. Primary network connection has been lost for more than 10 minutes or both the primary and backup connections have been down more than three minutes. Event A and B happened, but not C, and all this has lasted for more than 10 minutes and it is not sunday between 1am and 2am. These rules/scripts/compound events are important for my monitoring needs. I need to monitor a big enterprise with several data centers, complicated network topology and business systems comprising of several servers working together.
Well, this might need some work, but seeing that nagios can be extended
by any sort of event scripts and you can access the (relatively) raw
data from its logs this should be possible.
> ----------------------------------------------------------------
>
> Plus...
>
> All this collected or deducted data should be stored in an event database and used for history reports, snapshot reports, service level reports, trends/graphics,... everything. Naturally it needs a web user interface with at least two user levels (admin, monitor) and several views (network view, business view...). Flexibility and manageability are more important than instant ease of use.
Ok, to my knowledge - never tried this, but you can preocess the check
outputs to insert them into a database.
>
> These requirements rule out about 100% of the monitoring tools I have found. Please help. I'm lost.
:-)
> This would be used as a professional monitoring tool. It's a day job. Usually 5 x 8hrs, so I don't need anything "simple yet powerfull". It can also cost a bit. It can be hard to learn - no problem. So, do I have any other choice than HP OpenView or Tivoli Enterprise Console? What can nagios do for me?
Well, looking at your needs and your job description (and budget) I'd
say it like this: Nagios can provide an extensive framework for your
higher-level needs and supply almost all of the basic functionality.
Now, I'm not billing you as a consultant :-) but after such a short
description I'd say that given the above it should be possible to
implement something that offers everything you need. You will need to
spend some time studying nagios, discussing with the developers, and
doing some scripting yourself. You will need to set up a number of
monitoring hosts to distribute the workload (depending, of course, on
the temporal detail you need). You will need - assuming you work alone -
about two months for the basic monitoring and load distribution, and
after that comes the setup of compound events and the sort of reporting
you need. You will need a mangement that understands that its requests
need funding, time, and discussion.
In the end, you will have done a lot of the necessary work yourself, but
you will have something that fits to your business, that was most
probably less expensive in license costs and does not require more
support than something commercial. And, even better, you will have given
some valuable experiences and work to all Nagios users :-)
To sum up - I think you can use nagios. Try it with one testing machine
and set up some monitoring. Implement reports, notifications, and
distributed monitoring in small steps. Spend the necessary time and
money. Read and use the mailing lists.
After some relatively short time you can decide yourself.
Arno
>
>
>
--
IT-Service Lehmann al at its-lehmann.de
Arno Lehmann http://www.its-lehmann.de
-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list