RFC: New IPC Method for Check Results
Ethan Galstad
nagios at nagios.org
Wed Apr 11 18:20:53 CEST 2007
Based on issues that have come up in the past regarding the IPC method
used by Nagios for handling host/service check results, I am proposing a
major change to how things are done with Nagios 3.
The current IPC method:
Active host/service check results are passed from child processes to the
main Nagios process in two pieces: check information through a pipe, and
plugin output through a temp file.
Passive check results must be fed through the external command file.
Nagios then forks a child process and passes the check results to the
main Nagios process in a similar fashion as with active checks.
Problems with the current method:
1. When the Nagios daemon stops, child processes may still be performing
host/service checks. The results of those checks are lost, which is not
ideal.
2. Large numbers of passive checks (from distributed/redundant setups)
can cause load/memory problems. The external command buffers and
service check result buffers can fill up, causing external agents (e.g.
NSCA) to block when they attempt to write passive check results to the
external command file.
3. When the Nagios daemon is not running, external agents like NSCA,
cannot write to the external command file, which either results in a
blocking behavior or check results being lost.
Proposed solution:
The new method I am proposing is simple and straightforward. Why I
didn't implement something like this years ago is beyond me. :-)
Instead of passing check results from child processes to the main Nagios
process via two methods (pipe and file), I suggest that all information
be written to files in a special check result queue directory (e.g.,
var/checkresults). Child processes that perform host/service checks can
write all results to a file in the queue directory. The main Nagios
process will then periodically process all files/check results in the
queue in a time-ordered fasion.
This method is ideal for handling the problems with the current IPC method:
1. When the Nagios daemon stops, child processes that are still
performing host/service checks can write the results to the queue
directory. When Nagios starts up again, it will process all those
results, so nothing was lost.
2a. Passive checks can still be submitted through the external command
file. In this case Nagios will not have to fork child processes - it
will simply write the passive check results to the queue directory.
2b. Using a queue directory will allow external agents (e.g. NSCA) to
submit passive check results by directly writing files in the queue
directory without having to submit commands through the external command
interface. This should reduce the dependence on NSCA and allow for
performance improvements in environments where there are a large number
of passive checks.
3. When Nagios is not running, external agents like NSCA can write check
results to the queue directory without worrying about blocking. Nagios
will process all check results when it starts up again.
Any performance hits that may occur with the new IPC method due to disk
thrashing can be minimized if the queue directory is placed on a
memory-mapped filesystem. Whether this will actually be necessary or
not in all but the largest installations remains to be seen.
I currently have half of the code implemented and can post working code
to CVS within the next week. I'm interested to hear what folks on the
list think about the new method before I make the switch, as doing so
will involve ripping out most of the current IPC code. Once I do so, I
don't want to have to backtrack. :-)
Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
More information about the Developers
mailing list