Weird error with Nagios 2.0b4 on RHEL 4
Andreas Ericsson
ae at op5.se
Fri Oct 28 01:57:21 CEST 2005
Fred wrote:
> I may have been getting lucky with the service_message struct warning, however,
> it has not seemed to have been a problem even on a system of over 1000+ nodes
> with 6 distributed monitors.
>
>>From looking at the code, the service_message struct appears to be the data
> structure that is created when a worker thread pulls a line off of the
> nagios FIFO and creates a structured work item and adds it to a queue. The
> message appears to be a warning that writing the data and accessing it between
> threads might be at risk, however, there seem to be locks around the access.
>
Inaccurate. The service_message struct is what's being written to the
pipe for later processing. If only 512 bytes are written and the struct
is larger than that, you're in for trouble.
> I had actually built a test image where I changed the max hostname length from
> 64 to 40 just to push the structure under the 512 but there were no apparent
> changes (note I was debugging what I believe to be a Linux FIFO problem that
> causes some fgets() calls to complete even if they don't have a \n in the
> buffer, essentially,
fgets() is supposed to return whatever it can read if there's no newline
within the limits of the second arg.
That being said, Nagios read()'s the fifo.
> writes that fill the entire FIFO buffer at 8k cause
> a premature completion and therefore a fifo corruption) Turns out when I
> shrunk the service_message struct I was able to reproduce the FIFO failures
> much more quickly ...
>
This is weird. Most systems have sysconf(_SC_PAGE_SIZE) for atomic
writes, since that's what natural for the system. This would mean 4096
for Linux on i386 and shouldn't ever create fifo inconsistencies.
> I believe on EM64T the time and other substructures push the size over the
> edge.
>
int on 64-bit archs are sometimes 64 bits wide. If there's no penalty in
doing 32-bit processing it'll be 32 bits.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list