Dependent service checks don't fail when depended-on service check fails

Jarrod Moore masternayru at gmail.com
Tue Mar 31 05:33:34 CEST 2009


On Fri, Mar 27, 2009 at 5:43 PM, Matthias Flacke <Matthias.Flacke at gmx.de> wrote:
>
> Jarrod Moore wrote:
>> On Thu, Mar 26, 2009 at 7:57 PM, Andreas Ericsson <ae at op5.se> wrote:
>>> Jarrod Moore wrote:
>>>> Hello everyone,
>>>>
>>>> I have a couple of related questions regarding service dependencies in
>>>> Nagios and their limitations. I have two service checks (let's call
>>>> them A and B) and service A depends on service B to function
>>>> correctly. I want to set Nagios up so that if service B crashes then
>>>> both services A and B are put into the critical state in Nagios. I've
>>>> tried using service dependencies in Nagios to represent this behaviour
>>>> but have yet to be successful. I can only get it to suppress
>>>> notifications of service A if both services go down.
>>>>
>>> This is expected behaviour. If A is truly dependant on B, then A will
>>> turn into a non-ok state of its own volition rather than as a result
>>> of any dependency magic. Dependencies are designed as a means of
>>> suppressing notifications. Otherwise, you would *always* get a
>>> notification for B first, and a minute or so later from A (actually,
>>> without the dependency you could get from A first).
>>>
>>>> Is there a way to do what I'm trying to do here? I'd have thought it
>>>> would be logical that if a service depends on another service and the
>>>> service depended on dies then all services depending on it would fail
>>>> their checks as well, but there;s probably some scenario where it
>>>> doesn't work so well. I've had a look through the mailing list
>>>> archives and found someone had asked a similar question to the
>>>> nagios-devel list about 2.5 years ago and didn't end up getting an
>>>> answer, so I thought I might ask whether solutions to this type of
>>>> problem had been developed since then.
>>>>
>>> They haven't. You're using dependencies the wrong way, really. If
>>> A is truly dependent on B and doesn't go into a non-ok state after
>>> B has crashed, then your check isn't doing what it's supposed to do,
>>> or you've misunderstood the relationship somehow.
>>>
>>> If you were to explain what the two services actually are, it would
>>> be easier to point you to a solution that works.
>>>
>>> --
>>> Andreas Ericsson                   andreas.ericsson at op5.se
>>> OP5 AB                             www.op5.se
>>> Tel: +46 8-230225                  Fax: +46 8-230231
>>>
>>> Considering the successes of the wars on alcohol, poverty, drugs and
>>> terror, I think we should give some serious thought to declaring war
>>> on peace.
>>>
>>
>> Well basically I have a map (similar to Google Maps) embedded in a
>> website, which hits a URL to retrieve maps. So I have one check using
>> check_http to check that the website itself is up and another check on
>> that URL to make sure that the map service is available. Now if the
>> map service goes down, the website is still up but the maps won't
>> appear, which means the website's functionality is significantly
>> affected. However, it is still up and viewable so doing a check on the
>> website URL still passes.
>>
>> Now of course I could just write a script or something to check both
>> URLs and set that as the check command. There is a problem for me with
>> this approach, however, because I have some other instances where a
>> web service depends on other web services. When I want to use these
>> services in websites, I'd then have to write a check for each script,
>> each containing every service in the chain that is needed to display
>> the website correctly. This way of doing things just seems a bit
>> repetitive to me, especially when I have a check for these web
>> services already.
>
> You can give check_multi a try (http://my-plugin.de/check_multi).
>
> It allows to combine multiple checks on plugin level and has a
> builtin state logic to evaluate the results of these checks.
> You can reuse the command files by implementing macros.
>
> If I understood your setup correctly the whole result should return
> CRITICAL if either the main website or the map are not accessible.
> This is the standard behaviour of check_multi and could be
> implemented like this:
>
> # foo.cmd
> # call: check_multi -f <foo.cmd> -s URLWEB=<url of website> -s
> URLMAP=<url of map>
> command [ website ] = check_http ... -u $URLWEB$ ...
> command [ map     ] = check_http ... -u $URLMAP$ ...
>
> It should work already with these two statements like you expect it
> with simple check_http, only combined. If one of the child checks
> fails, the whole construct returns WARNING or CRITICAL.
>
> If you need the RC determination more sophisticated, you can define
> it in perl syntax like this:
> state [ WARNING ] = website != OK || $website$=~/some evil output/
> state [ CRITICAL] = website >= WARNING && map != OK
>
> Cheers,
> -Matthias
>

Hi Matthias,

Thanks for the link. I've been checking (no pun intended) out
check_multi over the last day or two and I like it. My main concern
with this, though, is that if I had 10 websites that were dependent on
the map service then I'll be running the same check 10 times per
notification interval and it just seems kinda wasteful when you
already have one check set up that job. Of course, in my current
situation it isn't a huge issue and may not be in the future but I
just wanted to know what the most efficient solution for my problem
was. In any case, I like the plugin so I'll use it if there aren't any
better options available.

Thanks,
Jarrod

------------------------------------------------------------------------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list