Continuing issues with retention file causing schedule/actions to be ignored.
Eli Stair
estair at ilm.com
Thu Mar 9 21:23:15 CET 2006
Here I go multitasking, file attached. I've also attached a day's worth
of 'premature script header' errors from the apache logs, WRT that
error. Here's an example of a view in extinfo.cgi that was working one
minute, and then after a "refresh" it errors out:
Loading this URL:
https://monitor02/nagios/cgi-bin/extinfo.cgi?type=1&host=deathstar1258
Results in this error (momentarily):
[Thu Mar 09 12:08:59 2006] [error] [client 10.73.16.108] Premature
end of script headers: extinfo.cgi, referer:
https://monitor02/nagios/cgi-bin/status.cgi?hostgroup=all&style=hostdetail&hoststatustypes=4&hostprops=42
I still haven't been able to get any indication of the cause (or even
the existence) of the scheduling/event stalling issues. Nothing ever
appears "incorrect" in nagios' logs or schedule, only the lack of events
occuring. One more item I noticed after I removed the retention.dat
file yesterday: In addition to event handlers for one service not being
executed, there was one user who did not trigger "acknowledgement"
emails even though it should have, while my ack's sent an email. After
the file removal, that problem went away also. In practice this can
take several weeks to a month+ of running before I notice the issue
cropping up again, in that time I add/remove hundreds (thousands) of
hosts/services, reload and stop/start nagios dozens of times...
Are there any potential fixes for these behaviour in CVS? I havent seen
them addressed at all in -devel, while there have been a few reports of
similar issues.
(Nagios 2.0, x86_64,
7385 services.
754 hosts.
6454 service dependencies.
47 commands.
)
/eli
Eli Stair wrote:
>
> I'm continuing to have problems when retention.dat file gets into a
> state where the nagios process stops functioning properly. The problems
> I've had in the past were increasing numbers of hosts or entire
> hostgroups no longer executing their service checks, and now (today)
> that the event handler for one particular service stopped being executed
> (while all others continue to work).
>
> In this and all previous cases, stopping nagios and moving the retention
> file out of the way resolves the issue. Reloading or a hard stop/start
> of nagios doesn't have any effect. There has never appeared to be
> anything "wrong" with the retention file.
>
> The only issues with my installation are this issue, and the
> all-too-frequent "premature end of script headers" in all the CGI's, and
> "Warning: Size of service_message struct (528 bytes) is >
> POSIX-guaranteed atomic write size (512 bytes). " due to compiling
> x86_64. That being said, I have enough issues that there dozens of
> daily "premature script header/Internal Server Error" wreaking havoc
> with production, and these instances of event failures that are
> extremely critical. The script header problem came into being
> immediately upon upgrading from 2.0b6 to 2.0rc2+, and the
> scheduling/retention problem has been present to varying degrees in
> every 2.0b+ I've tried.
>
> I am happy to find these are configuration/optimization issues on my end
> I can resolve, but my suspicion is they are bugs. I will do anything I
> can to help provide a debug testbed for identifying and tracking them
> down. Attached is my main nagios config (objects are not included), and
> I can provide any other data (object configs, logs, retention.dat, etc)
> privately if needed (security concerns).
>
> Please let me know what I can do to help address this and find a
> resolution.
>
> Regards,
>
> /eli
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> that extends applications into web and mobile media. Attend the live
> webcast
> and join the prime developer group breaking into this new coding territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: nagios.cfg
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060309/fb61431e/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: nagios.script_header_errors
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060309/fb61431e/attachment-0001.ksh>
More information about the Developers
mailing list