Distributed monitoring Freshness checking failing then recovering
Ivan Fetch
ifetch at du.edu
Tue Oct 16 21:32:50 CEST 2007
Hi Sean,
On Mon, 15 Oct 2007, Sean McAvoy wrote:
> On further investigations it looks as though the problem is with the
> time taken to submit the results back to nagios via send_nsca.
> I have read about a couple different options for getting results back
> quickly. One being a bulk system of transfer, a file containing the
> results is sent via a send_nsca bulk transfer executed via cron. The
> other being a system that makes use of the performance data output
> option on the remote nagios systems and submits the results using a
> custom daemon on both ends.
> Does anybody know of any other options? Also, is there any guides to
> setting up either of these options, most of what I have read is email
> threads..
> Thanks.
>
> On 12-Oct-07, at 12:40 PM, Sean McAvoy wrote:
>
>> Hello,
>> I have 1 central nagios system with 5 distributed servers. I have
>> enabled freshness checking on both central and remote systems. I am
>> constantly seeing services go to unknown status for 1-3 minutes and
>> then recover.
>> on the remotes I have:
>> check_service_freshness=1
>> service_freshness_check_interval=10
>> check_host_freshness=1
>> host_freshness_check_interval=60
>> service_inter_check_delay_method=s
>> max_service_check_spread=10
>> service_interleave_factor=1
>> host_inter_check_delay_method=s
>> max_host_check_spread=30
>> max_concurrent_checks=0
>>
>> It does appear as though checks are being run in parallel. I'm wonder
>> how I can best determine where the problem is, with the execution of
>> checks, submittal to the central system or other.
>> Thanks.
>>
>>
>> _sean
>>
>> ----------------------------------------------------------------------
>> ---
>> This SF.net email is sponsored by: Splunk Inc.
>> Still grepping through log files to find problems? Stop.
>> Now Search log events and configuration files using AJAX and a
>> browser.
>> Download your FREE copy of Splunk now >> http://get.splunk.com/
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when
>> reporting any issue.
>> ::: Messages without supporting info will risk being sent to /dev/null
>
> Sean McAvoy
> NOC Acting Team Lead
> Afilias Canada
>
> P. 416.673.4194
>
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems? Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
This may be the caching possibility you have already mentioned, but
here is a blog posting about caching send_nsca:
http://altinity.blogs.com/dotorg/2006/11/caching_nsca_da.html
This is in the back of my mind for us down the road, but I have not
looked into it personally, just seen the post. I have just started
looking at what Opsview has to offer.
Thanks,
Ivan.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list