RFC/PATCH: Handle external service check results in seperate thread
Ethan Galstad
nagios at nagios.org
Fri Apr 13 13:02:41 CEST 2007
Stefan Rompf wrote:
> Hi,
>
> like other people on this list, we've been bitten by the problem that nagios
> fork()s subprocesses when service check results arrive via the external
> command pipe. When nagios lags for example due to hostchecks, in most cases
> enough forked processes pile up to bring nagios over its resource limits.
> Even if this doesn't happen, results will be fed in the wrong order.
>
> I've developed the following solution that is quite different to the spool
> directory approach:
>
> -passive service check results are added to passive_check_result_list as
> before. However, for our use case it does not make sense to keep multiple
> results for one service as soon as nagios starts lagging. So we have a
> duplicate detection that keeps only the newest check result per service.
> -Instead of forking subprocesses, a permanently running thread feeds the
> results on passive_check_result_list back via write_svc_message(). So two
> threads of the process talk to each other via a pipe, but I didn't want to
> make my changes too invasive ;-)
> -Instead of polling the command pipe every 0.5 seconds, select() on the file
> descriptor is used now if there are enough external_command_buffer_slots.
> Problem here was that with no writer on the pipe, select() endlessly signaled
> an EOF. Fixed by opening the command pipe R/W.
>
> The patch has been developed on nagios 2.6 and linux, afterwards forward
> ported to current CVS. It seems to work, but needs further testing. Even
> compilation tests on different architectures would be interesting, I'm not
> sure how widespread the tsearch()-API is.
>
> Thoughts?
>
> Stefan
Sounds interesting. I'm still leaning towards the spool directory idea,
as it provides from resistance to problems when Nagios isn't running
and/or the external command file pipe fills up.
One thing to watch out for is the idea of discarding old/duplicate check
results. This isn't always a good thing. Consider security alerts that
come in as passive checks. If you discard all but the newest alert you
could potentially miss some critical information...
Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
More information about the Developers
mailing list