My experience with Nagios

Fabiano Reis fsilos at ig.com
Thu Oct 30 13:38:22 CET 2003


Hi,

I have had installed nagios on my net{work} and now I would like to report a little about this operation.

1) Using --enable-embedded-perl on the configure script

I really don´t have time to see what was happening, but I collect some information to send to you. I attached some files here to demonstrate what I did and what error I got trying to use nagios after I compiled it with --enable-embedded-perl.

I used output redirect to create the files.

[root at axeh nagios-source-1.1]# ./configure --prefix=/tools/nagios/bin-1.1 --with-gd --enable-embedded-perl --with-nagios-user=nagios --with-nagios-grp=nagios 1> configure.out 2> configure.err

[root at axeh nagios-source-1.1]# make all 1> make.out 2> make.err
[root at axeh nagios-source-1.1]# make install 1> make.install.out 2>make.install.err
# this is where I installed
[root at axeh nagios-source-1.1]# cd ../bin-1.1/
[root at axeh bin-1.1]# ldd bin/nagios
        libperl.so => /usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libperl.so (0x40014000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x40145000)
        libdl.so.2 => /lib/libdl.so.2 (0x4015b000)
        libm.so.6 => /lib/i686/libm.so.6 (0x4015e000)
        libpthread.so.0 => /lib/i686/libpthread.so.0 (0x40180000)
        libc.so.6 => /lib/i686/libc.so.6 (0x42000000)
        libcrypt.so.1 => /lib/libcrypt.so.1 (0x401d0000)
        libutil.so.1 => /lib/libutil.so.1 (0x401fe000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
[root at axeh bin-1.1]# bin/nagios
Segmentation fault
[root at axeh bin-1.1]# bin/nagios -v
Segmentation fault
[root at axeh bin-1.1]# gdb bin/nagios
GNU gdb Red Hat Linux (5.2.1-4)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...(no debugging symbols found)...
(gdb) run
Starting program: /tools/nagios/bin-1.1/bin/nagios
[New Thread 16384 (LWP 4892)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 4892)]
Perl_PerlIO_stdout (my_perl=0x0) at perlio.c:4370
4370    perlio.c: No such file or directory.
        in perlio.c
(gdb) quit
The program is running.  Exit anyway? (y or n) y
[root at axeh bin-1.1]#

So, I don´t have any idea on what is going on.

This is a RedHat 8.0 on a Intel machine with perl 5.8 (default instalation of perl package)


2) Using --with-file-perfdata on the configure script

Trying to use nagios configured with --with-file-perfdata I have no $USER?$ macros more than $USER1$. I noticed that when I was using $USER2$ variable
to set a password to access a webserver using check_http. I did this test only one time and I got this behaviour, so I recompiled nagios without this option and everythink is ok now.

the configure line I used was: ./configure --prefix=/tools/nagios/bin-1.1 --with-gd --with-nagios-user=nagios --with-nagios-grp=nagios --with-file-perfdata

3) Tests on passive checks.

I tested passive checks on Nagios and I had this problem:

host: test
service: procs

I configured nagios to accept passive checks on service 'procs' of host 'test'. I used (by force) the check_command called dummy, this is configured on the checkcommand.cfg file to run check_dummy with parameter 2 (return critical state, you will understand...) command . So, when freshness limit exceeds to this service, the check command 'dummy' runs and change the state of this service to 'critical' (that is why i used 2 on the parameter line). Well. this is my point. At this time If the host that send information that this service is CRITICAL to nagios, Nagios dont alarm. This is a change of CRITICAL to CRITICAL state. Is there any parameter that start the notice procedure on nagios?

If yes, I will be gratefull.

4) Mr Ethan, I looked at the FAQ section on nagios website and I read this:

---------------------

>From http://www.nagios.org/upcoming.php

Parallelized/Smarter Host Checks 
This feature could definitely be labeled as the "golden goose that got way". In fact it *keeps* getting away, as I keep bumping it off to be "implemented later". Its just not terribly important to me at the moment... However, I refuse to get rid of the idea anytime soon. :-) 

The current host check logic isn't necessarily bad, its just that it can be really slow if Nagios has to check many different "levels" of parent/child hosts. If you look at the current code (and if you can understand it) you'll see that Nagios waits for a host check to complete before checking children or parents of that host. Obviously there is some major room for improvement here. I would like to find a way to predict which parent/child hosts are also going to need to be checked and launch those checks in parallel. This has potential for major speed improvements in the host check code, so its something I'll be looking into. However, its been on my mental drawing board for quite some time without being implemented, so I'm not guaranteeing anything. 

-----------------------------------------

I agree with you. If Nagios has the ability to run check_host_alive command in parallell I think this will be the Nagios Gold Version :-) By the way, I used fping (www.fping.com) instead of ping and I got a better performance. I had to write a  patch to check ping because it has an arithmetic bug on it. This is the patch:

this is for fping-2.2b2
--- /tmp/fping.c        2003-10-26 20:43:27.000000000 -0200
+++ fping.c     2000-12-08 16:42:32.000000000 -0200
@@ -895,22 +895,13 @@
       fprintf(stderr, "\n");
     } else {
       if (h->num_recv <= h->num_sent) {
-               if(h->num_sent > 0) {
        fprintf(stderr, " xmt/rcv/%%loss = %d/%d/%d%%",
                h->num_sent, h->num_recv,
                ((h->num_sent - h->num_recv) * 100) / h->num_sent);
-               } else { fprintf (stderr," xmt/rcv/%%loss = %d/%d/100%%",
-                               h->num_sent, h->num_recv);
-                       }
       } else {
-             if(h->num_sent > 0) {
        fprintf(stderr, " xmt/rcv/%%return = %d/%d/%d%%",
                h->num_sent, h->num_recv,
                ((h->num_recv * 100) / h->num_sent));
-               } else { fprintf( stderr," xmt/rcv/%%return = %d/%d/100%%",
-                               h->num_sent, h->num_recv);
-                       }
-
       }
       if (h->num_recv) {
        avg = h->total_time / h->num_recv;

That is.

Fabiano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20031030/1e7d510f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: install-outputs.tar
Type: application/x-tar
Size: 51200 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20031030/1e7d510f/attachment.tar>


More information about the Developers mailing list