SEGV in 2.0b2 (FreeBSD 4.10/200 hosts/330 active/300 passive) - repeatedly after 2-7 days running.
Ethan Galstad
nagios at nagios.org
Mon Apr 4 06:03:25 CEST 2005
Thanks for the note Stanley. If you can manage to get a core file or
track the problem down further, let me know. I'm releasing 2.0b3
tonight, so this won't probably be fixed until 2.0b4.
On 2 Apr 2005 at 19:53, Stanley Hopcroft wrote:
> Dear Folks,
>
> I am writing to report what may be a problem with Nag 2.0b2 (embedded
> Perl, pthread lib, FreeBSD 4.10).
>
> Nagios runs no more than 10 days before dieing with a SEGV.
>
> Like a former report of SEGVs ('coredumps in wobbly
> networks'/Ericsson/24 Mar 2005) there _may_ be a pattern in the logged
> messages before the SEGV.
>
> Exitting from scheduled downtime appears to be a health hazard.
>
> In the last case,
>
> Sat Apr 02 17:05:42 SERVICE DOWNTIME ALERT:
> foo:bar via the blurfl provider
> infrastructure;STOPPED; Service has exited from a period of scheduled
> downtime Sat Apr 02 17:06:18 Auto-save of retention data completed
> successfully.
>
> Sat Apr 02 18:07:33 Nagios 2.0b2 starting... (PID=97771)
>
> tsitc> grep nagios /var/log/messages
> Apr 2 17:07:52 tsitc /kernel: pid 3400 (nagios), uid 1000: exited on
> signal 11
>
> And the one before,
>
> Tue Mar 29 06:20:58 SERVICE ALERT: nada;TEC CPU;WARNING;HARD;1;The
> percentage of CPU in idle state is low. This indicates high CPU
> overload. date: 03/29/2005 06:20:50 AM eventid: 1112041070 557
> modelname: DMXCpu name: total percidlecpu: 0 profilename:
> ITM.OS.Unix_Dev_Monitoring.itm#IPAustralia-region source: TMNT status:
> OPEN
>
> Tue Mar 29 06:30:44 SERVICE DOWNTIME ALERT: yada;Standard host-centric
> checks;STOPPED; Service has exited from a period of scheduled downtime
>
> Tue Mar 29 06:30:44 SERVICE DOWNTIME ALERT: wurfl;COMS ad-hoc
> check;STOPPED; Service has exited from a period of scheduled downtime
>
> Tue Mar 29 06:30:44 HOST DOWNTIME ALERT: yada;STOPPED; Host has exited
> from a period of scheduled downtime Tue Mar 29 09:11:27 Nagios 2.0b2
> starting... (PID=5473)
>
> tsitc> grep nagios /var/log/messages
> Mar 29 06:30:44 tsitc /kernel: pid 31467 (nagios), uid 1000: exited on
> signal 11
>
> Obviously it is easy to check whether scheduling downtime is causal; I
> will give it a go and watch.
>
> No core file.
>
> Yours sincerely.
>
> --
> Stanley Hopcroft
>
> IP Australia
> Ph: (02) 6283 3189 Fax: (02) 6281 1353
> PO Box 200 Woden ACT 2606
> http://www.ipaustralia.gov.au
>
Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
More information about the Developers
mailing list