Anyone? : SIGSEGV when trying to use eventhandler
Guy Waugh
guidosh at gmail.com
Wed May 19 12:07:52 CEST 2010
I'm definitely no expert but...
* What does it say when you 'ldd' the nagios binary? Are all the libraries
the binary is linked against able to be found? Are those libraries
up-to-date?
* Where did you get nagios from? Did you compile it or is it pre-built? If
pre-built, are there any updates?
* I don't know Solaris well enough to know how to trace your running nagios
with a very simple configuration, but that might be the next step. strace?
On 19 May 2010 10:49, nagios <nagios at chadmail.com> wrote:
> Anybody?
>
> If you need extra information, just let me know what you need to see and
> I'll upload it.
>
> Thanks.
>
> -----Original Message-----
> From: "nagios" <nagios at chadmail.com>
> To: nagios-users at lists.sourceforge.net
> Date: Wed, 19 May 2010 01:42:15 +1000
> Subject: [Nagios-users] SIGSEGV when trying to use eventhandler
>
> Hi guys,
> I am new to nagios but so far it's working well for me and is
> monitoring a number of real and virtual hosts. Nagios 3.0.6 is installed on
> an OpenSolaris 2009.06 host and monitoring routers other devices and VM's in
> VirtualBox.
>
> My issue is when I try to add an event handler, I get a SIGSEGV and nagios
> restarts.
>
>
> I have posted the details of the code I am using and the error here...
> http://pastebin.com/vBb7xTND and also below (but it reads better @
> pastebin).
>
> I have tried several different scripts and code combinations (even empty
> scripts and commands like ls) and all give the same error.
>
> Can anyone help me work out why it's happening?
>
> Thanks.
>
> hosts.cfg
> <snip>
> define host{
> use windows-server ; Inherit default values from a template
> host_name Server6 ; The name we're giving to this host
> max_check_attempts 4
> event_handler vboxmanage-restart ; Restart the vm
> alias Server 6 - Win2008 Server ; A longer name associated with the host
> address 192.168.0.6 ; IP address of the host
> }
> <snip>
>
> commands.cfg - note I have tried various scripts here incl. ones from the
> nagios guides/books and all give the same error.
> <snip>
> # 'vboxmanage_restart' command definition
> define command{
> command_name vboxmanage-restart
> # command_line ls
> command_line sudo -u nas $USER1$/eventhandler/event_vboxmanage_restart -S
> $SERVICESTATE$ -T $SERVICESTATETYPE$ -A $SERVICEATTEMPT$ -H Server6
> }
> <snip>
>
> nagios.log
> [1274193005] HOST ALERT: Server6;DOWN;SOFT;1;PING CRITICAL - Packet loss =
> 100%
> [1274193005] Caught SIGSEGV, shutting down...
> [1274193005] Nagios 3.0.6 starting... (PID=5231)
> [1274193005] Local time is Wed May 19 00:30:05 EST 2010
> [1274193005] LOG VERSION: 2.0
> [1274193005] Finished daemonizing... (New PID=5232)
>
> the scripts... (yes I know it should not be 777's but just to show it's not
> a permissions thing)
> -rwxrwxrwx 1 nagios nagios 1580 2010-05-18 00:52 event_vboxmanage_restart
> -rwxrwxrwx 1 nagios nagios 3815 2010-05-18 23:07 filename.out
> -rwxrwxrwx 1 nagios nagios 2211 2010-05-19 00:23 restart-httpd
> nas at nas:/usr/nagios/libexec/eventhandler#
>
> The script work fine from the user nagios using sudo (added nagios to
> /etc/sudoers)
> nas at nas:…sr/nagios/libexec/eventhandler$ whoami
> nagios
> nas at nas:…sr/nagios/libexec/eventhandler$ sudo -u nas
> ./event_vboxmanage_restart -S CRITICAL -T HARD -A 1 -H Server6
> CRITICAL(C) 2005-2010 Sun Microsystems, Inc.
>
> The event_vboxmanage_restart script...no that this is likely to be at fault
> (I do not think anyway as I get the error with other very simple scripts
> too).
> #!/usr/bin/perl
>
> use Getopt::Long;
> use Net::Telnet ();
> use Switch;
> my ($state,$type,$attempt,$cmd,$hostname);
> open(MYOUTFILE, ">>/usr/nagios/libexec/eventhandler/filename.out");
>
> &processargs;
> print "$state";
> switch ($state) {
> case "OK" { &state_OK }
> case "WARNING" { &state_WARNING }
> case "UNKNOWN" { &state_UNKNOWN }
> case "CRITICAL" { &state_CRITICAL }
> else { print "unrecognised state>$state" }
> }
> print MYOUTFILE">$state<";
> print MYOUTFILE">$hostname<";
> close(MYOUTFILE);
> exit 0;
>
> sub processargs {
>
> GetOptions (
> "S|state=s" => \$state,
> "T|type=s" => \$type,
> "A|attempt=i" => \$attempt,
> "H|hostname=s" => \$hostname,
> "C|command=s" => \$cmd,
> );
> }
>
> ### FUNC: print $state
> sub print_state {
> }
> ### FUNC: print $state
> sub state_OK {
> }
> ### FUNC: print $state
> sub state_WARNING {
> }
> ### FUNC: print $state
> sub state_UNKNOWN {
> }
> ### FUNC: print $state
> sub state_CRITICAL {
> if ("$type" eq "HARD" or ("$type" eq "SOFT" and $attempt == 3))
> {@result=`VBoxManage controlvm $hostname acpipowerbutton`; foreach (@result)
> {
> print MYOUTFILE"$_\n";
> };sleep(60);@result=`VBoxManage controlvm $hostname poweroff`;foreach
> (@result) {
> print MYOUTFILE"$_\n";
> }; @result=`VBoxManage startvm $hostname`; print "$result[1]";
> }
> else { }
> }
>
> As you can see from the below, it all works fine (ie. no SIGSEGV's) if I
> comment out the eventhandler line from the hosts.cfg file.
> [05-19-2010 01:33:50] SERVICE ALERT:
> Server6;Explorer;OK;HARD;1;Explorer.EXE: Running
> [05-19-2010 01:32:50] SERVICE ALERT: Server6;Uptime;OK;HARD;1;System Uptime
> - 0 day(s) 0 hour(s) 9 minute(s)
> [05-19-2010 01:32:40] SERVICE ALERT: Server6;C:\ Drive Space;OK;HARD;1;c:\
> - total: 39.90 Gb - used: 9.19 Gb (23%) - free 30.71 Gb (77%)
> [05-19-2010 01:32:10] SERVICE ALERT: Server6;CPU Load;OK;HARD;1;CPU Load 3%
> (5 min average)
> [05-19-2010 01:25:00] HOST ALERT: Server6;UP;SOFT;4;PING OK - Packet loss =
> 0%, RTA = 0.44 ms
> [05-19-2010 01:23:50] SERVICE ALERT:
> Server6;Explorer;CRITICAL;HARD;1;Connection refused
> [05-19-2010 01:23:50] HOST ALERT: Server6;DOWN;SOFT;3;PING CRITICAL -
> Packet loss = 100%
> [05-19-2010 01:23:00] SERVICE ALERT:
> Server6;Uptime;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds
> [05-19-2010 01:22:50] SERVICE ALERT: Server6;C:\ Drive
> Space;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds
> [05-19-2010 01:22:30] HOST ALERT: Server6;DOWN;SOFT;2;PING CRITICAL -
> Packet loss = 100%
> [05-19-2010 01:22:20] SERVICE ALERT: Server6;CPU
> Load;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds
> [05-19-2010 01:21:10] HOST ALERT: Server6;DOWN;SOFT;1;PING CRITICAL -
> Packet loss = 100%
> [05-19-2010 01:21:00] SERVICE ALERT:
> Server6;Uptime;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
> [05-19-2010 01:20:50] SERVICE ALERT: Server6;C:\ Drive
> Space;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
> [05-19-2010 01:02:10] SERVICE ALERT: Server6;CPU Load;OK;SOFT;1;CPU Load 0%
> (5 min average)
> [05-19-2010 01:00:50] SERVICE ALERT: Server6;Uptime;OK;SOFT;1;System Uptime
> - 0 day(s) 0 hour(s) 57 minute(s)
> [05-19-2010 01:00:40] SERVICE ALERT: Server6;C:\ Drive Space;OK;SOFT;1;c:\
> - total: 39.90 Gb - used: 9.19 Gb (23%) - free 30.71 Gb (77%)
>
>
>
>
> ------------------------------------------------------------------------------
>
>
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100519/f42e3832/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list