multiple nagios monitoring that have to agree?
John P. Rouillard
rouilj at cs.umb.edu
Thu Mar 30 05:16:36 CEST 2006
In message <A7B0A9F02975A74A845FE85D0B95B8FA03468E18 at misex01.ena.com>,
"Marc Powell" writes:
>> From: On Behalf Of John P. Rouillard
>> Sent: Wednesday, March 29, 2006 4:45 PM
>> In message <A7B0A9F02975A74A845FE85D0B95B8FA03468E0E at misex01.ena.com>,
>> "Marc Powell" writes:
>> >> -----Original Message-----
>> >> From: On Behalf Of Philip Hallstrom
>> >> Sent: Wednesday, March 29, 2006 3:54 PM
>> >> I'm wondering if two nagios instances can be set up to monitor the
>> >> same hosts/services and have to agree with each other before
>> >> sending a notification?
>
>[chop]
>
>>
>> >For an off-the-cuff suggestion, if you used multiple retries and didn't
>> >specifically require that both servers see the state as HARD you could
>> >embed that logic in your notification script.
>> >
>> >- NagiosA always sends notifications.
>>
>> If you have a redunant setup, only one server A or B would have to
>> send notifications for the service B.
>>
>I presume that you're referring to this from your previous e-mail --
>"On both nagios 1 and 2 create service B that does notify (and poll)
>that uses check_cluster to require that both be in error condition to
>generate an error notification."
Correct.
>How would you prevent duplicate notifications? Nagios 1 wouldn't know
>that Nagios 2 had already sent a notification and vice-versa unless you
>kept track of that externally.
The site where it was set up originally had the second server as a
backup notifier. If it lost connectivity to the primary server it
switched on notifications.
Later a seperate SEC process on the second server monitored the
primary's notifications and would release notifications queued up by
the second nagios process (keyed by host, service, severity) if the
notifications from the first and second didn't come through within 5
minutes of each other. It worked and made sure that alert's weren't
delayed more than 5 minutes, but frankly the original setup with the
second server not notifying unless it lost heartbeat on the original
server (or the original server detected it couldn't get pages out) had
a lot fewer issues. Then again I didn't have to work there.
>> >- ServiceX on HostY reaches hard state.
>> >- NagiosA initiates notification for ServiceX on HostY
>> >- Notification script searches status.log on NagiosB or performs HTTP
>> >screen scrape on NagiosB to determine state of ServiceX on HostY as
>> >seen from there.
>> >- If NagiosB shows CRITICAL, send notification
>> >- If only one shows critical do nothing(?)
>> >- repeat at regular intervals in case NagiosB was slow to pick up the
>> >state (or use the vice-versa logic to also send notifications from
>> >NagiosB)
>>
>> Neat idea, however you would need to handle the case where nagios B
>> isn't properly updating the service (and therfore isn't providing
>> valid data).
>
>Looking at Last Update should cover that scenario.
True.
>> >There are probably pitfalls but I think that's how I would approach it
>> >at first.
>>
>> Yeah. It's a bit dicey regardless of how you slice it.
>
>
>Agreed. Interesting problem though.
Yup then again so is automaticaly rewriting the nagios config files
and correcting the parent links so they can be used on a redundant
host.
-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list