Drill Down Facility in APAN
Stanley Hopcroft
Stanley.Hopcroft at IPAustralia.Gov.AU
Thu Apr 24 11:50:58 CEST 2003
Dear Sir,
I am writing to thank you for your letter and say,
On Thu, Apr 24, 2003 at 08:45:09PM +1200, Jamie Baddeley wrote:
.. snip
> There's shitloads of rrd front-end's out there. Cricket, MRTG NRG etc etc.
> see here:
> http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/rrdworld/index.html
> Why are we creating another one?
Yep. Moreover, the front-ends are in general aimed at graphing rather
than exception reporting whereas Nag is aimed at exception detecting.
Also the front-ends already do two significant things
1 Collect data efficiently
2 Set up the RRDs without intervention
that any Nag infrastructure would have to either crib or redevelop.
> Smokeping already does what Atul was asking about. RRD is backend, of course
> you can do this. Store the additional the RRD files and zoom. simple. It what
> my system does.
Up to the upper limit of the RRD resolution - related to how many
observations/samples you have, but fundamentally yes.
> I can't understand why we are screwing around with front-ends when the data
> that nagios needs to make a decision on whether the threshold is being
> breached is held in the RRD files that a multitude of packages already look
> after.....
.. snip
> All that needs to be done is a plugin that reads local RRD files.
I think so too. I have done this in two trivial examples (by way of
1 A plugin that reads the FAILURES RRA from a dev branch RRD (that with
the time series prediction) and reports CRITICAL if the last sample is a
1 ... (ie the Holt-Winters prediction +- 2 * DEVPREDICT is still less
than the observation: the measurement is an aberration)
2 A plugin that computes the differences in observations and reports
CRITCAL if all are zero (this is to detect that a producer process has
Since there are Perl and Python (probably Ruby also) bindings to the RRD
libraries, this is pretty easy.
Here's the guts of the first one
use RRDs ;
use utils qw($TIMEOUT %ERRORS &print_revision &support &usage);
my $PROGNAME = 'check_coms' ;
Getopt::Long::Configure('bundling', 'no_ignore_case') ;
("V|version" => \&version,
"h|help" => \&help,
"r|rrd_file:s" => \$rrd,
"s|start:s" => \$start,
"d|debug" => \$debug,
) ;
use constant RRD => '/home/anwsmh/perl/rrd/hwpredict/coms.rrd' ;
use constant START => 'now -1 hour' ;
use constant RRA_SUCC => 'AVERAGE' ;
use constant RRA_FAIL => 'FAILURES' ;
use constant GRAPH => '<a
href=http://pc09011/cgi-bin/cg2?RRD_NAME=coms&INT=-1h>graph</a>' ;
my @rra_fail = () ;
my $fetch_ok = &from_rrd($rrd, $start, RRA_FAIL, \@rra_fail) ;
&outahere('UNKNOWN', 'COMS cannot be checked. ', [ 'RRDs::fetch failed
with error "', @rra_fail, '"' ]) unless $fetch_ok ;
print "HW predicted Failures\n" and &dump( @rra_fail ) if $debug
# &from_rrd returns
# $rra_x[$i]->[0] $rra_x[$i]->[1]
# 1029576900 74.0
# 1029577200 0.0
foreach ( @rra_fail ) {
push @fail, $_->[1] ;
&outahere('OK', 'Ok.', [ ($delta_s == 0 ? "Nothing processed
successfully in last $observed_int minutes." :
$delta_s == 1 ? "$delta_s success $last_succ
minutes ago." : "$delta_s successes $last_succ minutes ago."),
"Deltas: (" . join(' ', reverse
@delta_s) . ') or Holt-Winters forecast', GRAPH ])
if $fail[-1] == 0 ;
# HW predicts failure. Is it because the predictions have failed to
# converge after a restart ?
# In this case, @succ may look like (2000, 0, 0, 1, 1, 2)
# @delta_s [reversed] (1, 0, 1, 0, -2000)
&outahere('CRITICAL', 'Failed. No restart but HW forecast
violations.', [ $delta_s, ($delta_s == 1 ? 'success' : 'successes'),
"$last_succ minutes ago.",
"$observed_int minute deltas: (" . join('
', reverse @delta_s) . ') or Holt-Winters forecast', GRAPH ]) ;
In this case, the plugin also presents differences (to convince the
contact) and a link to a (rrdcgi) graph that shows the output.
> It seems simple to me. But that may be because I'm crap at coding, and better
> at hacking
> All that needs to be done is a plugin that reads .rrd files.
Nag developers can add value by considering how this can be done
efficiently for large numbers of RRDs.
Perhaps having a Nag add-on process _all_ the RRDs periodically (perhaps
anything found in a path) and submit passive results for anomalies the
add-on detects.
Yours sincerely.
Stanley Hopcroft
'...No man is an island, entire of itself; every man is a piece of the
continent, a part of the main. If a clod be washed away by the sea,
Europe is the less, as well as if a promontory were, as well as if a
manor of thy friend's or of thine own were. Any man's death diminishes
me, because I am involved in mankind; and therefore never send to know
for whom the bell tolls; it tolls for thee...'
from Meditation 17, J Donne.
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list