<HTML>
<div><font style="font-family:tahoma;font-size:10pt;">
<div>Hi guys,</div>
<div> I am new to nagios but so far it's working well for me
and is monitoring a number of real and virtual hosts. Nagios 3.0.6 is
installed on an OpenSolaris 2009.06 host and monitoring routers other
devices and VM's in VirtualBox.</div>
<div> </div>
<div>My issue is when I try to add an event handler, I get a SIGSEGV and
nagios restarts.</div>
<div> </div>
<div> </div>
<div>I have posted the details of the code I am using and the error
here...http://pastebin.com/vBb7xTND and also below (but it reads better @
pastebin).</div>
<div> </div>
<div>I have tried several different scripts and code combinations (even
empty scripts and commands like ls) and all give the same error.</div>
<div> </div>
<div>Can anyone help me work out why it's happening?</div>
<div> </div>
<div>Thanks.</div>
<div> </div>
<div>hosts.cfg<br />
<snip><br />
define host{<br />
use windows-server ; Inherit default values from a template<br
/>
host_name Server6 ; The name we're giving to this host<br />
max_check_attempts 4<br />
event_handler vboxmanage-restart ; Restart the vm<br />
alias Server 6 - Win2008 Server ; A longer name associated
with the host<br />
address 192.168.0.6 ; IP address of the host<br />
}<br />
<snip><br />
<br />
commands.cfg - note I have tried various scripts here incl. ones from the
nagios guides/books and all give the same error.<br />
<snip><br />
# 'vboxmanage_restart' command definition<br />
define command{<br />
command_name vboxmanage-restart<br />
# command_line ls<br />
command_line sudo -u nas $USER1$/eventhandler/event_vboxmanage_restart -S
$SERVICESTATE$ -T $SERVICESTATETYPE$ -A $SERVICEATTEMPT$ -H Server6<br />
}<br />
<snip><br />
<br />
nagios.log<br />
[1274193005] HOST ALERT: Server6;DOWN;SOFT;1;PING CRITICAL - Packet loss =
100%<br />
[1274193005] Caught SIGSEGV, shutting down...<br />
[1274193005] Nagios 3.0.6 starting... (PID=5231)<br />
[1274193005] Local time is Wed May 19 00:30:05 EST 2010<br />
[1274193005] LOG VERSION: 2.0<br />
[1274193005] Finished daemonizing... (New PID=5232)<br />
<br />
the scripts... (yes I know it should not be 777's but just to show it's not
a permissions thing)<br />
-rwxrwxrwx 1 nagios nagios 1580 2010-05-18 00:52 event_vboxmanage_restart<br
/>
-rwxrwxrwx 1 nagios nagios 3815 2010-05-18 23:07 filename.out<br />
-rwxrwxrwx 1 nagios nagios 2211 2010-05-19 00:23 restart-httpd<br />
nas@nas:/usr/nagios/libexec/eventhandler# <br />
<br />
The script work fine from the user nagios using sudo (added nagios to
/etc/sudoers)<br />
nas@nas:…sr/nagios/libexec/eventhandler$ whoami
<br />
nagios<br />
nas@nas:…sr/nagios/libexec/eventhandler$ sudo -u nas
./event_vboxmanage_restart -S CRITICAL -T HARD -A 1 -H Server6
<br />
CRITICAL(C) 2005-2010 Sun Microsystems, Inc.<br />
<br />
The event_vboxmanage_restart script...no that this is likely to be at fault
(I do not think anyway as I get the error with other very simple scripts
too).<br />
#!/usr/bin/perl<br />
<br />
use Getopt::Long;<br />
use Net::Telnet ();<br />
use Switch;<br />
my ($state,$type,$attempt,$cmd,$hostname);<br />
open(MYOUTFILE,
">>/usr/nagios/libexec/eventhandler/filename.out");<br />
<br />
&processargs;<br />
print "$state"; <br />
switch ($state) {<br />
case "OK" { &state_OK }<br />
case "WARNING" { &state_WARNING }<br />
case "UNKNOWN" { &state_UNKNOWN }<br />
case "CRITICAL" { &state_CRITICAL }<br />
else { print "unrecognised state>$state" }<br />
}<br />
print MYOUTFILE">$state<";<br />
print MYOUTFILE">$hostname<";<br />
close(MYOUTFILE);<br />
exit 0;<br />
<br />
sub processargs {<br />
<br />
GetOptions (<br />
"S|state=s" => \$state,<br />
"T|type=s" => \$type,<br />
"A|attempt=i" => \$attempt,<br />
"H|hostname=s" => \$hostname,<br />
"C|command=s" => \$cmd,<br />
);<br />
}<br />
<br />
### FUNC: print $state<br />
sub print_state {<br />
}<br />
### FUNC: print $state<br />
sub state_OK {<br />
}<br />
### FUNC: print $state<br />
sub state_WARNING {<br />
}<br />
### FUNC: print $state<br />
sub state_UNKNOWN {<br />
}<br />
### FUNC: print $state<br />
sub state_CRITICAL {<br />
if ("$type" eq "HARD" or ("$type" eq
"SOFT" and $attempt == 3)) {@result=`VBoxManage controlvm
$hostname acpipowerbutton`; foreach (@result) {<br />
print MYOUTFILE"$_\n";<br />
};sleep(60);@result=`VBoxManage controlvm $hostname poweroff`;foreach
(@result) {<br />
print MYOUTFILE"$_\n";<br />
}; @result=`VBoxManage startvm $hostname`; print "$result[1]";<br
/>
}<br />
else { }<br />
}<br />
<br />
As you can see from the below, it all works fine (ie. no SIGSEGV's) if
I comment out the eventhandler line from the hosts.cfg file.</div>
<div>[05-19-2010 01:33:50] SERVICE ALERT:
Server6;Explorer;OK;HARD;1;Explorer.EXE: Running<br />
[05-19-2010 01:32:50] SERVICE ALERT: Server6;Uptime;OK;HARD;1;System Uptime
- 0 day(s) 0 hour(s) 9 minute(s)<br />
[05-19-2010 01:32:40] SERVICE ALERT: Server6;C:\ Drive Space;OK;HARD;1;c:\ -
total: 39.90 Gb - used: 9.19 Gb (23%) - free 30.71 Gb (77%)<br />
[05-19-2010 01:32:10] SERVICE ALERT: Server6;CPU Load;OK;HARD;1;CPU Load 3%
(5 min average)<br />
[05-19-2010 01:25:00] HOST ALERT: Server6;UP;SOFT;4;PING OK - Packet loss =
0%, RTA = 0.44 ms<br />
[05-19-2010 01:23:50] SERVICE ALERT:
Server6;Explorer;CRITICAL;HARD;1;Connection refused<br />
[05-19-2010 01:23:50] HOST ALERT: Server6;DOWN;SOFT;3;PING CRITICAL - Packet
loss = 100%<br />
[05-19-2010 01:23:00] SERVICE ALERT: Server6;Uptime;CRITICAL;HARD;1;CRITICAL
- Socket timeout after 10 seconds<br />
[05-19-2010 01:22:50] SERVICE ALERT: Server6;C:\ Drive
Space;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds<br />
[05-19-2010 01:22:30] HOST ALERT: Server6;DOWN;SOFT;2;PING CRITICAL - Packet
loss = 100%<br />
[05-19-2010 01:22:20] SERVICE ALERT: Server6;CPU
Load;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds<br />
[05-19-2010 01:21:10] HOST ALERT: Server6;DOWN;SOFT;1;PING CRITICAL - Packet
loss = 100%<br />
[05-19-2010 01:21:00] SERVICE ALERT: Server6;Uptime;CRITICAL;SOFT;1;CRITICAL
- Socket timeout after 10 seconds<br />
[05-19-2010 01:20:50] SERVICE ALERT: Server6;C:\ Drive
Space;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds<br />
[05-19-2010 01:02:10] SERVICE ALERT: Server6;CPU Load;OK;SOFT;1;CPU Load 0%
(5 min average)<br />
[05-19-2010 01:00:50] SERVICE ALERT: Server6;Uptime;OK;SOFT;1;System Uptime
- 0 day(s) 0 hour(s) 57 minute(s)<br />
[05-19-2010 01:00:40] SERVICE ALERT: Server6;C:\ Drive Space;OK;SOFT;1;c:\ -
total: 39.90 Gb - used: 9.19 Gb (23%) - free 30.71 Gb (77%)<br />
</div>
</font></div>
</HTML>