Generic check result manipulations (percentages, max(), min(), etc)
Nathanael Hoyle
nhoyle at hoyletech.com
Wed Jan 9 17:49:54 CET 2008
Hey all,
Hope this hasn't been asked and answered to death, but I've read through
forums and quite of bit of the mail archives and can't find prior
discussion. I am trying to monitor several Dell PowerEdge servers for a
variety of availability criteria, including things like average
processor load and percentage of disk space used. This is trivial using
the check_nt plugin and the nsclient (or another API compatible
monitoring agent), which I am well aware of. In fact, I've tested that
just fine without difficulty on the box I'm prototyping on. The issue
is that the production servers I'll be monitoring are government
interest, and there is substantial overhead for accrediting any new
software (particularly a persistent process accepting connections) to be
installed on the machines. I'd rather not try to fight to get nsclient
accredited.
One of the nice things about the PowerEdge servers is that they have
fairly advanced backplane status monitoring and provide a host of
information via snmp. I have configured and tested things like
obtaining the processor load values via snmp with:
define service{
name cpu1-load
use generic-service
service_description CPU 1 Load
hostgroup poweredge2850-servers
check_command check_snmp!-C removed -o
HOST-RESOURCES-MIB::hrProcessorLoad.1 -w 0:80 -c 0:95
notification_options c
first_notification_delay 10
}
There are several hosts, so these are set up against a hostgroup, etc.
There are four processors in each machine; the relevant availability
metric is more the average processor load across all four processors
than it is the load of any one processor. What I want is a way to
capture the average of these four values and test that result against
various threshold criteria. Something like an avg() macro that allowed
me to pass multiple checks within it.
Similarly, the disk drive configurations are slightly different amongst
the various hosts, but there are more hosts than I want to calculate and
hand-specify warning/critical thresholds based on used space for against
their varying total space. The ability to do something like
percent(<snmp check for used space>, <snmp check for total space>) and
check that against the thresholds would be an ideal solution which
generically supported all configurations.
Again, I realize that checking for percentage of free disk space is
available with nsclient, with local disk checking, and with remote ssh
checks. ssh is not an option either in this case (performance and
security concerns). It seems to me however, that the need/desire to
calculate these type of values based on component values is more broadly
applicable and could be useful in areas outside my somewhat unusual
needs. So my question is... is there some built-into-the-config-file
syntax I'm missing to calculate this stuff? Would I have to extend the
snmp plugin? Could a plugin generically wrap other plugin results to do
this... in other words, what is likely to be the least-pain method of
being able to do this? Ideally, I'd hope that the result would not be
plugin-specific, i.e. need to be implemented for check_snmp and any
other plugin needed.
I'd be happy to hear what ideas folks have (if it's already out there,
great!).
Thanks,
Nathanael Hoyle
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list