SNMP Monitoring conundrum
Max
perldork at webwizarddesign.com
Fri Jul 17 18:40:46 CEST 2009
On Fri, Jul 17, 2009 at 12:08 PM, Israel
Brewster<israel at frontierflying.com> wrote:
> So if I understand you correctly, after you make 1 agent type per script,
> you would then write a wrapper script calling multiple individual scripts
> for the cases where you want more than one piece of data? The approach
> certainly bears consideration, especially since, as you say, the simpler
> scripts make debugging easier. I hadn't originally thought of that approach
> because I wanted two pieces of data displayed in nagios: power state and
> estimated run time, so I just made one script that gave that data. It might
> be worth breaking it down more though. Thanks for the suggestion.
Not really wrapper scripts :), all the scripts I listed collect
information through different OIDs but they may check several metrics
at once, for instance the obvious example of CPU .. system, user, I/O
wait, kernel % time, etc.
Each of the agents exposes this information through different MIBs and
each exposes varying levels of information, so the script for each
agent type does the right thing for that agent / MIB which lets us
fully use all information from the agent for the type of check without
getting into huuuge scripts :).
For example, with Sysedge when you query it for CPU, it gives cooked
%s for each CPU metric .. it returns the actual % utilization for each
metric, whereas for Net-SNMP if you want to get kernel, system, user
etc as individual %s you have to query the remote agent twice with a
pause of N seconds between each sample, take the deltas between each
measurement and then calculate % .. so having both of those sets of
logic in one script would quickly make for a big script.
At the configuration level we then associate each agent specific
script -> command -> service with an agent-specific hostgroup, for
example, for all the net-snmp scripts I listed, each has a service
that calls it that is associated with a net_snmp_host host group.
Custom thresholds for each script are codified at the host level /
host template level using Nagios 3 custom attributes, e.g.
define host {
...
name my_custom_base_host
hostgroups +net_snmp_host
__snmp_swap_warn pct_used,gt,50
__snmp_swap_crit pct_used,gt,60
__snmp_mem_crit pct_used,gt,99
__snmp_cpu_warn 'wait,gt,30'
__snmp_cpu_crit 'wait,gt,50'
register 0 ; template
}
(In a command you could access __snmp_cpu_warn as $HOST_SNMP_CPU_WARN$)
so by creating a base template like the above, one of our users can
then just inherit from the template as they add hosts that are
Net-SNMP and they get all the Net-SNMP checks in the net_snmp_host
hostgroup along with the thresholds defined in the base template ..
e.g.
define host {
use my_custom_base_host
host_name foo.example.com
alias FOo
address 192.168.3.1
}
Now that host will get all 6 Net-SNMP checks in the hostgroup along
with reasonable thresholds .. if thresholds need to be customized,
they can be overridden at the host level :)
define host {
use my_custom_base_host
host_name bar.example.com
alias Bar
__snmp_swap_crit 10 ; this host should not ever swap really
address 192.168.3.1
}
the difficulty with this approach is of course documenting custom
attributes so that people using this methodology know what custom
attributes are associated with which host group -> services mappings,
which are required, which are optional, etc.
We use Nagios::Plugin and Nagios::Plugin::SNMP extensively to do a lot
of the grunt work of plugins and to provide a common set of options
across checks.
- Max
------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list