nrpe and NetBackup Plugin
Andreas Ericsson
ae at op5.se
Thu Oct 20 18:37:44 CEST 2005
Tao Yaoning wrote:
> the check_nb_queque works its output looks like "OK - Queue is of normal
> size [0]"
> the check_nb_jukebox works too, output looks like "OK - all drives are up."
>
> but the check_nb_errs still doesn't work error message looks like
> "CHECK_NRPE: Error receiving data from daemon."
What does the syslog file say on the server where nrpe is running?
Presumably you have two servers involved in this problem.
Let's call one of the servers NAGIOS and the other NRPE. The server NRPE
is the one that's running the NRPE daemon (the one you want to fetch
data *FROM*). The server NAGIOS is the one running the NAGIOS daemon
which calls the check_nrpe program (the one you want to fetch data *TO*).
Here's what I want you to do:
On the NRPE server (not, I repeat *NOT*, on the NAGIOS server) I want
you to run the command *exactly* as it is specified in the nrpe
configuration file while logged in as the user the nrpe daemon runs as.
You can do this by running this command if you have sudo installed, are
logged in as root (which I assume is what you normally log in as....)
and the nrpe configuration file is called /etc/nrpe.cfg
eval `sed -n /^nrpe_user=/p` /etc/nrpe.cfg;
sudo -u $nrpe_user `sed -n s/command.check_nb_errs.=//p` >/dev/null
If you didn't get any output there, you won't get any output in Nagios
either.
> The permission has no problem. I can run this command as nagios on local
> machine,
Please don't use terms like "local machine". To me, the "local machine"
is my laptop. I have absolutely no idea which server you're talking
about when you say "local machine". Since you're mentioning the nagios
user I'll assume you're running this on the NAGIOS server, but that
can't be right because you said earlier that that didn't work.
> and get output like "ERRORS: (59 total) 1.) bptm has error-level
> general error: cannot count up drives, device manager daemon (ltid) may not
> be running. 2.) bptm has error-level general error: cannot count up drives,
> device manager daemon (ltid) may not be running. 3.) bptm has error-level
> general error: cannot count up drives, device manager daemon (ltid) may not
> be running. 4.) bptm has error-level general error: cannot count up drives,
> device manager daemon (ltid) may not be running. 5.) bptm has error-level
> general error: cannot count up drives, device manager daemon (ltid) may not
> be running. 6.) bptm has error-level general error: cannot count up drives,
> device manager daemon (ltid) may not be running.
This does definitely seem like stderr output to me. NRPE only reads
output on stdout.
> .............................."
>
> the script for output is
> if (defined(@errors)) {
> if ($critcount) { $status = CRITICAL; }
> elsif ($warncount) { $status = WARNING; }
> else { $status = UNKNOWN; }
> # print "NETBACKUP ERRORS: (" . ( $critcount + $warncount ) . " total) ";
> print "ERRORS: (" . ( $critcount + $warncount ) . " total) ";
> my $counta = 0;
> foreach my $errorline (@errors) {
> $counta++;
> print "$counta.) $errorline ";
> }
> print "\n";
> } else {
> # print "No Netbackup errors found.\n";
> print "OK: No Netbackup errors found.\n";
> }
>
This can't possibly be the entire script and is as such completely
worthless for debugging purposes. It would also help if you attached it
as a file rather than paste it inline, since your mail program seems to
do funny things with the indentation.
> I get debug from strace
>
> munmap(0xb7fff000, 4096) = 0
> write(3, "\27\3\1\0 }\22\302\252\250\7!%\251Xs+\253\361dh \232\266"...,
> 1114) = 1114
> read(3, "sh: l", 5) = 5
> write(3, "\25h:\0 \304\374\20\240\370\23wK]s\241\232\300\347W\270"..., 37) =
> 37
> alarm(0) = 10
> write(3, "\25h:\0 #\347\366\313\257\32\2714\327D\17\16 \2l\4D/3)"..., 37) =
> 37
> close(3) = 0
> fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0xb7fff000
> write(1, "CHECK_NRPE: Error receiving data"..., 46CHECK_NRPE: Error
> receiving data from daemon.
> ) = 46
> munmap(0xb7fff000, 4096) = 0
> exit_group(3) = ?
>
This is an strace from the check_nrpe program. You're (still) looking at
the entirely wrong end of the problem here since your other checks seems
to work just fine.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list