nsca / distributed monitoring result problem

Chris Goosen cgoosen at jhb.artec.co.za
Thu Jan 19 09:33:07 CET 2006


Thanks for the tip... seems like a logical place to look..

Will publish my findings

-----Original Message-----
From: Frederik Vanhee [mailto:frederik.vanhee at perso.be] 
Sent: Thursday, January 19, 2006 8:30 AM
To: Chris Goosen
Cc: basile au siris; nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] nsca / distributed monitoring result problem

Well,

I advise you to check the firewall-logging. I had similar problems 
before. Because I have a lot of distributed servers (23) who manage 5000

services in total, there are a lot of connections to port 5667 on the 
central server and sometimes the firewall thinks that this is an attack 
and blocks the traffic. So the nsca-communcation is left unterminated 
and nsca is 'hanging' on your central server.

Frederik


Chris Goosen wrote:

>Yes, I have an ISA 2004 (if you can call ISA a firewall!!) between the
2
>servers
>
>-----Original Message-----
>From: Frederik Vanhee [mailto:frederik.vanhee at perso.be] 
>Sent: Wednesday, January 18, 2006 10:37 PM
>To: basile au siris
>Cc: Chris Goosen; nagios-users at lists.sourceforge.net
>Subject: Re: [Nagios-users] nsca / distributed monitoring result
problem
>
>
>Hello,
>
>is there a firewall between the central server and the distributed
>server ?
>
>Frederik
>
>basile au siris wrote:
>
>  
>
>>hi
>>maybe i have the same problem
>>i have distributed monitoring and the central server sometimes freeze 
>>and i just
>>have to reboot it ( electric )
>>i suspect nsca ( or hardware problem ) because sometimes i note there 
>>are many ( 50 )
>>nsca process and if i restart it all become normal again
>>hope we solve our problem
>>basile
>>
>>
>>Chris Goosen wrote:
>>
>>    
>>
>>>Hello all..
>>>
>>>I am running my nagios central server on an HP 2.4ghz with 512mb ram.
>>>
>>>At present, I am monitoring 65 hosts with approx. 400 services.
>>>
>>>After a reboot, everything works perfectly, but the longer my server 
>>>runs, the more sluggish it gets and eventually the nsca processes 
>>>consume all the memory and the server stops responding. What also 
>>>happens it that I start getting hosts that are reported as down even 
>>>though they have the correct ping response.. the error says "PLUGIN 
>>>TIMED OUT after 10 seconds"
>>>
>>>Here is an example of what I mean:
>>>
>>>Host State Information
>>>
>>>Host Status:
>>>
>>>    
>>>
>>>DOWN
>>>
>>>Status Information:
>>>
>>>    
>>>
>>>CRITICAL - Plugin timed out after 10 seconds
>>>
>>>Last Status Check:
>>>
>>>    
>>>
>>>01-16-2006 12:06:28
>>>
>>>Status Data Age:
>>>
>>>    
>>>
>>>0d 0h 2m 57s
>>>
>>>Last State Change:
>>>
>>>    
>>>
>>>01-16-2006 10:20:44
>>>
>>>Current State Duration:
>>>
>>>    
>>>
>>>0d 1h 48m 41s
>>>
>>>Last Host Notification:
>>>
>>>    
>>>
>>>01-16-2006 10:20:44
>>>
>>>Current Notification Number:
>>>
>>>    
>>>
>>>2
>>>
>>>Is This Host Flapping?
>>>
>>>    
>>>
>>>N/A
>>>
>>>    
>>>
>>>OK 01-16-2006 12:05:47 63d 19h 30m 59s 1/3 PING OK - Packet loss = 
>>>0%, RTA = 0.42 ms
>>>
>>>I assume that these are related and that the lack of memory caused 
>>>this problem, would an upgrade to from nagios 1.2 to nagios 1.3 fix 
>>>this? If so, what is the best way to perform that upgrade?
>>>
>>>my /etc/xinetd.d/nsca file :
>>># default: on
>>># description: NSCA
>>>service nsca
>>>{
>>>flags = REUSE
>>>socket_type = stream
>>>wait = no
>>>user = nagios
>>>group = nagios
>>>server = /usr/sbin/nsca
>>>server_args = -c /home/e-smith/nagios/nsca.cfg --inetd
>>>cps = 9000 30
>>>instances = UNLIMITED
>>>log_on_failure += USERID
>>>disable = no
>>>only_from = ip1, ip2, ip3, etc..
>>>}
>>>
>>>command_check_interval= -1
>>>
>>>System info:
>>>
>>>SME server 6.01 (2.4.20-18.7, i686)
>>>
>>>Perl v5.6.1
>>>
>>>Apache/1.3.27
>>>
>>>Nagios 1.2
>>>
>>>Any advice would be great... thanks.
>>>
>>>Chris
>>>
>>>      
>>>
>>
>>-------------------------------------------------------
>>This SF.net email is sponsored by: Splunk Inc. Do you grep through log
>>    
>>
>
>  
>
>>files
>>for problems?  Stop!  Download the new AJAX search engine that makes
>>searching your log files as easy as surfing the  web.  DOWNLOAD
>>    
>>
>SPLUNK!
>  
>
>>http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
>>_______________________________________________
>>Nagios-users mailing list
>>Nagios-users at lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/nagios-users
>>::: Please include Nagios version, plugin version (-v) and OS when 
>>reporting any issue. ::: Messages without supporting info will risk 
>>being sent to /dev/null
>>
>>    
>>
>
>
>  
>



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list