Children "unreachable" on soft down?
Israel Brewster
israel at frontierflying.com
Wed Apr 8 20:32:04 CEST 2009
On Apr 8, 2009, at 9:28 AM, Marc Powell wrote:
>
> On Apr 8, 2009, at 11:44 AM, Israel Brewster wrote:
>
>> So is this just something I'll have to live with? I don't seem to be
>> getting much feedback on the subject. :(
>
> Well, my response would be to fix the problem that's causing the
> outages in the first place or adjust the way you're monitoring the
> parents so that the plugin used recognizes when this temporary event
> is occurring.
Ok, fair enough. There is nothing we can do about the outages (as I
explained in one of my e-mail, they are an artifact of the connection
type), so that leaves us with adjusting the monitoring. Now I thought
that the recheck options were there exactly for this reason: to catch
brief outages and not alert. And for the parent host that seems to be
the case, but apparently that logic doesn't carry on to the child
hosts. As such, somehow things would need to be adjusted so it never
even sees the outages, even enough to go into a soft down state.
Anyone have any suggestions for how I can accomplish this? Adjusting
the timeout or using, say, an ssh check rather than icmp won't do it -
the packets are still lost, and the ssh check would still timeout..
Perhaps if I sent more pings at longer intervals (so that if it
doesn't get a response the single check retries at 15 second intervals
or so before returning a response), but then the check would start
taking several seconds or more to complete, and that wouldn't be a
good thing. Assuming nagios even allowed a check to run that long -
doesn't it have a mechanism to kill a check that doesn't return in a
given time frame? I'm a little stumped here how I can adjust things.
> What you're asking for is that nagios track that the
> child went from down->unreachable->down without an intermediate OK
> state and suppress notifications in that case. That would appear to be
> a code change and would be better discussed on nagios-devel but I
> would encourage the check plugin approach first.
Ok. I know there is code in there that know who it sent down messages
to and doesn't send up messages to people that didn't get a down
(primarily dealing with escalations) so I was hoping that maybe there
would be something similar for this, i.e. seeing that the last
notification sent was a down notification, and as such there is no
need to send another. But if not, so be it. Thanks for the response!
-----------------------------------------------
Israel Brewster
Computer Support Technician II
Frontier Flying Service Inc.
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x293
-----------------------------------------------
>
> --
> Marc
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> High Quality Requirements in a Collaborative Environment.
> Download a free trial of Rational Requirements Composer Now!
> http://p.sf.net/sfu/www-ibm-com
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list