Monitoring Windows Servers - Comparing Options
Frater, Greg J
GJFRATER at bechtel.com
Mon Apr 7 23:03:54 CEST 2008
>No input from anyone? One of the selling points to move away from
proprietary solutions and towards OSS was the helpfulness of the
>>>>community and mailing list. I know I don't have a 'problem' listed
below that can easily be solved, but surely there are some opinions out
there?
Alright, I'll respond. Most of the time I usually just lurk
here, most questions are usually answered quite well by others in a
relatively short time. This is my 2 cents. See response below amongst
your first post questions.
>So, my question is, what is the best option for monitoring Windows
Servers and why?
>An important factor to consider is that I will probably not have
Administrative access to any of the Windows Servers that I will be
>monitoring.
>I'm currently considering three options for the reasons shown below (in
order of preference):
>1) WMI checks
> Pros
> -Complete control of NRPE service on 'my' Windows Server
> (Can this be distributed over two boxes?)
> -Complete control of check commands on Nagios Server
WMI provides a very thorough list of things you can check
It's built into the OS, i.e. no install or configuration
required
> Cons
> -NRPE service must be run from an account with access to all
Windows Servers.
WMI can fail. From time to time we have a server with WMI
problems, we've been able to fix WMI at times, other times we've had to
rebuild the OS, in those cases you would not have any monitoring.
WMI is used by other things as well, and is susceptible to
getting corrupted or reconfigured by something (or in our case someone)
else. I think we've resolved most of our issues with WMI, however I
learned that if it's broken, it does not matter why you can't use it to
monitor anything until it's fixed.
WMI requires common MS technology such as common MS network
ports and Windows (AD) user accounts for security. This is not a
problem for the most part but increases the risk when monitoring a DMZ
for example and means your monitoring is dependant upon AD and thus
anything AD needs (like DNS), etc. If AD or DNS goes down you've got
bigger problems than whether or not your Nagios agents are working,
however it that is how you watch everything then when they fail will you
know it?
>2) Agent checks - NRPE-NT or some other current Windows Agent
> Pros
Your not relying on a MS technology. Obviously the OS has to be
available but your not dependant upon WMI or .NET or MDAC, etc. This
may sound ironic but having your monitoring system as independent of the
system it's watching as possible the better (IMO).
Flexibility/Extensibility. For example the agent we use (nscp)
(found here: http://trac.nakednuns.org/nscp - don't ask me about the URL
I don't know why it is what it is - it seems safe though) is very
flexible supporting both the original NSClient on the Nagios server side
or NRPE and can check pretty much anything (including, WMI, perfmon,
event logs, plus some built in checks and custom scripts). The
documentation is not one of it's strong points, but it is very capable
and reliable. I've struggled at times getting a particular type of
check working because I could not understand the docs. I have not run
into any problems with it in terms of crashes or memory leaks, etc.
After about two years of using it (various versions) I can only think of
twice when it has either generated an error or stopped running and that
covers over 200 Windows boxes. It can be installed and uninstalled
without server reboots. It just works, I like it. Better docs would be
nice though. I'm not trying to sell you on this particular agent or
even agents in general, this is my experience.
> Cons
> -Requires an agent setup and running on every Windows Server
> -No direct control of agents/check commands
Some checks via an agent (like custom scripts) are not as quick
as SNMP (and probably WMI).
>3) SNMP based checks
> Pros
> -Complete control over check commands on Nagios Server
SNMP checks are very fast
Once configured no changes are required on the server (i.e. SNMP
does not require version upgrades, etc.)
> Cons
> -No direct control of SNMP Community Strings/ACL allowing
access from Nagios
> -No direct control of SNMP service
The values exposed via SNMP on Windows servers is limited. You
can do basic monitoring disks, ram, cpu, etc. But not advanced things
such as CPU user mode vs. kernel mode usage. You can get an SNMP
extension such as SNMP-Informant (http://www.snmp-informant.com/) to
resolve this issue.
SNMP can fail. We have a small portion of our servers both 2000
and 2003 where SNMP service won't stay running. We do some SNMP
monitoring but not a lot, I've not solved this one yet.
>How do you do it? Are there any other Pros & Cons that I might be
missing? Do you think I should consider a process/utility not listed?
>
>I'm open to suggestions on which agent is best, but I can probably
determine this from list archives and testing. At this time I'm more
>interested in which high-level option would be the best fit.
>Perhaps there is a good reason to use more than one of the above
options, and I'm just not aware of it yet.
>
>Any input would be greatly appreciated.
Agent based monitoring has been the most reliable for us. It
does require an agent install and config initially, though this has not
been a big deal for us. Agents are also extensible, NSCP is now able to
monitor Event Logs, it could not do that when we started using it. As a
general rule we don't rely on just one mechanism, our primary is by
agent but we also use SNMP when it makes sense. Hope that helps.
Regards,
-greg
Sys Admin
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Register now and save $200. Hurry, offer ends at 11:59 p.m.,
Monday, April 7! Use priority code J8TLD2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list