Locking problems?: Nagios 1.1 on Redhat Enterprise ES 2.1
Keith Weinberg
Keith.Weinberg at tudor.com
Mon Sep 29 22:09:28 CEST 2003
I've recently built the latest nagios for deployment on Redhat Enterprise ES
2.1 and our testing has shown two serious problems:
1) a simple-to-fix html rendering problem
2) a perplexing lock-contention issue
HTML rendering problem
The first was easy to fix. There seems to be a problem with with the html
generation in "tac.cgi" where a few lines are incorrectly commented out
using (/* */) which breaks the html. .
You can easily take out those comment lines by deleting lines 1223 and 1226
(/* and */) from tac.cgi - removing those fixed the broken HTML for us.
Onto the more serious problem. . .
Lock Contention?
After running the daemon for a while it looks like we get a number process
stuck in some wait loop:
nagios 22304 1 0 16:01 ? 00:00:00 /usr/bin/nagios -d
/etc/nagios/n
nagios 22311 1 0 16:01 ? 00:00:00 /usr/bin/nagios -d
/etc/nagios/n
nagios 22315 1 0 16:01 ? 00:00:00 /usr/bin/nagios -d
/etc/nagios/n
[etc. etc. into the hundreds over time]
Doing an strace of one of these processes, I see that ease of the processes
is hanging on a write:
strace -p 22075
write(6, "<hostname-deleted-for-security-purposes>\0\0\0"..., 504
(Of course the "hostname deleted" is really one of our hostnames)
It seems that write contention is causing terrific problems for us and will
mean that we can't roll out this version. . .
It would be very hard to blame the hardware for this. . . the disk is very
fast and we are using ext3 (both of which are 'better' than our old test
machine running 1.0 which didn't have these issues). . .
Is anyone aware of a quick fix for this? Is anyone else seeing this process
build-up?
Thanks in advance,
Keith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20030929/bd5d0af8/attachment.html>
More information about the Users
mailing list