A question...cascading failures and failure to recover

Patrick Morris patrick.morris at hp.com
Sat Feb 17 00:18:03 CET 2007


On Fri, 16 Feb 2007, Steven Schwartz wrote:

> I've noticed an odd circumstance on two of my four nagios servers
> lately, and searching has found me no answers. Has anyone experienced
> symptoms similar to these:
> 
> 1) On a given server, a plugin produces a "critical failure" on many
> (sometimes all) of the systems using that particular plugin.
> 
> 2) Tests by hand of said plugin produce an "OK" result.
> 
> 3) The system does not acknowledge the service having recovered until
> checks are rescheduled by force, and then execute OK.
> 
> Does this ring bells with anyone?

There are a lot of circumstances that could cause something like this,
from a bad plugin, to issues with embedded perl, to network issues, to
incorrect file permissions or environment.

There's just not nearly enough info here to have an idea where to start
looking, though.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list