Notifications stopped

Chris Wilson chris at aidworld.org
Wed Mar 2 11:58:54 CET 2005


Hi all,

I sent this yesterday but I don't think it ever reached the list. Sorry
if it gets duplicated.

Cheers, Chris.

-----Forwarded Message-----
From: Chris Wilson <chris at aidworld.org>
To: Andreas Ericsson <ae at op5.se>
Cc: Nagios Users <nagios-users at lists.sourceforge.net>
Subject: Re: [Nagios-users] Notifications stopped
Date: Tue, 01 Mar 2005 16:59:46 +0000

Hi Andreas,

First of all, thanks for taking the time to help me!

> > The file notify.out was created, but nothing written to it.
> 
> The mail command doesn't print anything to stdout normally, and if 
> nothing goes wrong it won't print to stderr either, so it's supposed to 
> be empty.

Yeah, I wasn't expecting it to create anything, but it does mean that
Nagios is running and making notifications.

> grep -i 'shutting down' nagios.cfg
> It might have not dumped core and just shut down gracefully. If you're 
> using CVS code a month or so old, that was the default behaviour.

I should have mentioned that I'm running Nagios 1.2, sorry.

Nagios was running and detected that the servers were down (visible in
status.cgi) but did not send notifications.

I can now reproduce this at will, and have an strace log of the problem.
When nagios is started at boot time, it seems to run, but can't send any
mail. It manages to spawn /bin/mail, which crashes with a segmentation
fault. This does not appear to be detected or logged by Nagios.

Restarting Nagios after boot fixes it, so I'm assuming that it's an
environment issue. Can anyone help me figure out what's wrong?

The last will and testament of /bin/mail is:

15:47:58 execve("/bin/mail", ["/bin/mail", "-s", 
	"[NAGIOS] Alert: \"Loband Pilot Web Server 1\" is DOWN",
 	"chris at aidworld.org"], [/* 13 vars */] = 0
15:47:58 uname({sys="Linux", node="dev.aidworld.org", ...}) = 0
[...]
15:47:58 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
15:47:58 connect(3, {sa_family=AF_INET, sin_port=htons(53),
	sin_addr=inet_addr("127.0.0.1")}, 28) = 0
15:47:58 fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
15:47:58 fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
15:47:58 gettimeofday({1109692078, 396780}, NULL) = 0
15:47:58 poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
15:47:58 --- SIGSEGV (Segmentation fault) @ 0 (0) ---

Thanks in advance for any help.

Cheers, Chris.
-- 
(aidworld) chris wilson | chief engineer (chris at aidworld.org)



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list