Monitoring cross-server services?
Matthieu Parisot
mat at avedya.com
Wed Jan 22 13:00:29 CET 2003
Hi,
Maybe you can have a look to that plugin I've written.
It allows you to create what I call metaservices, which status is
dependent of several other services states;
By putting no notification for services and notifications only for
metaservices, you should be able to do what you want.
Best regards,
Matthieu
Steven Grimm wrote:
>Ran a quick test with service dependencies as you suggested, and I'm now
>convinced they aren't sufficient for monitoring a peer-to-peer application.
>And really there's no way they can be, because they don't take the details
>of error reports into account.
>
>Here's what happens with service dependencies, again using hosts A, B, and
>C which are all connected to each other. I kill the app on host B as you
>describe, and Nagios notifies me that it's died. Host A reports an error
>condition because it isn't connected to host B, and thanks to the service
>dependency, that error doesn't cause a second notification. So far so good.
>
>Now I tweak host A so the app there can't reach its counterpart on host C,
>a condition which *should* trigger a notification. (The monitoring host
>can still reach the app on host C.) Host A reports that it can't reach
>hosts B or C. And notification of that error gets suppressed by the
>dependency on host B's service, which is still down. Nagios knows that
>host A's service is broken and sees that a depended-on service isn't
>running, so therefore it suppresses the notification without regard to
>*why* the failure is happening.
>
>Hope that example makes more sense than my previous ones.
>
>What I need here is a "service" that's really a comparison between Nagios'
>view of the current state of the world (whether the P2P app looks alive
>on all peers from the point of view of the monitoring host) and a plugin's
>view of the state of the world (whether the app looks alive on all peers
>from the point of view of the particular host being checked.) If the
>plugin and Nagios agree about what's up and what's down, it's not a
>failure, but any discrepancy between those two views of the world *does*
>indicate a problem.
>
>Even setting aside my particular setup, that ability would be of value on
>large networks with complex routing, anywhere it's possible for hosts to
>lose connectivity to each other while remaining reachable from the
>monitoring host.
>
>Like I said in my original message, I can work around this by parsing
>the status file myself, not a big problem. Once it's in a presentable
>state I'll post my workaround, which this discussion has convinced me
>to make a bit more general-purpose than I'd originally planned.
>
>-Steve
>
>
>
>
>On Tue, Jan 21, 2003 at 11:18:33AM -0600, Carroll, Jim P [Contractor] wrote:
>
>
>>I've got quite a number of service dependencies defined, and they add
>>absolutely nothing to the service detail page. Basically I set up a
>>rudimentary check for NRPE ('echo "NRPE is OK"' in nrpe.cfg), and made all
>>the other NRPE checks for that host dependent on the rudimentary check. If
>>NRPE is down, I want *one* page, not however many NRPE checks I'm doing.
>>Ordinarily I wouldn't expect NRPE to be down (since it's kicked off from
>>(x)inetd), but if an admin rebuilds a host or makes some other unfortunate
>>change to (x)inetd, we don't want to be flooded with notifications; a simple
>>"um... excuse me? NRPE doesn't seem to be up" is just fine.
>>
>>My case differs from your case in that my dependencies occur on the same
>>host, and yours occur on different hosts. Having said that, if I
>>acknowledge that NRPE is down (the depended-on service), that doesn't
>>automatically flag any other services, or the host itself for that matter,
>>as being down/acknowledged/ignored.
>>
>>If you're still uncertain, your best bet is to create a trivial case on 2 or
>>3 of your lesser hosts. Use netcat to listen on some arbitrary ports, and
>>have Nagios poke at those 'services'. Then kill netcat on the 'depended-on'
>>host. Wait for Nagios to notify you. Acknowledge it. Kill netcat on one
>>of the 'dependent' hosts. Wait for Nagios to notify you. And wait and
>>wait, because you shouldn't hear a peep. Bring netcat back up on the first
>>host. Eventually you should get a notification that the 'service' on the
>>second host is down.
>>
>>In this scenario, any other services should be completely independent;
>>Nagios should still notify you if one of those goes down. (Feel free to
>>test this in whatever permutations/combinations with this scenario, as
>>well.)
>>
>>HTH.
>>
>>jc
>>
>>
>
>
>-------------------------------------------------------
>This SF.net email is sponsored by: Scholarships for Techies!
>Can't afford IT training? All 2003 ictp students receive scholarships.
>Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more.
>www.ictp.com/training/sourceforge.asp
>_______________________________________________
>Nagios-users mailing list
>Nagios-users at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/nagios-users
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: check_meta-0.1.tgz
Type: application/x-compressed
Size: 3712 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20030122/15413a84/attachment.bin>
More information about the Users
mailing list