nsca / distributed monitoring result problem
basile au siris
basile.mathieu at siris.sorbonne.fr
Mon Jan 16 16:25:03 CET 2006
hi
maybe i have the same problem
i have distributed monitoring and the central server sometimes freeze
and i just
have to reboot it ( electric )
i suspect nsca ( or hardware problem ) because sometimes i note there
are many ( 50 )
nsca process and if i restart it all become normal again
hope we solve our problem
basile
Chris Goosen wrote:
> Hello all..
>
> I am running my nagios central server on an HP 2.4ghz with 512mb ram.
>
> At present, I am monitoring 65 hosts with approx. 400 services.
>
> After a reboot, everything works perfectly, but the longer my server
> runs, the more sluggish it gets and eventually the nsca processes
> consume all the memory and the server stops responding. What also
> happens it that I start getting hosts that are reported as down even
> though they have the correct ping response.. the error says “PLUGIN
> TIMED OUT after 10 seconds”
>
> Here is an example of what I mean:
>
> Host State Information
>
> Host Status:
>
>
>
> DOWN
>
> Status Information:
>
>
>
> CRITICAL - Plugin timed out after 10 seconds
>
> Last Status Check:
>
>
>
> 01-16-2006 12:06:28
>
> Status Data Age:
>
>
>
> 0d 0h 2m 57s
>
> Last State Change:
>
>
>
> 01-16-2006 10:20:44
>
> Current State Duration:
>
>
>
> 0d 1h 48m 41s
>
> Last Host Notification:
>
>
>
> 01-16-2006 10:20:44
>
> Current Notification Number:
>
>
>
> 2
>
> Is This Host Flapping?
>
>
>
> N/A
>
>
>
> OK 01-16-2006 12:05:47 63d 19h 30m 59s 1/3 PING OK - Packet loss = 0%,
> RTA = 0.42 ms
>
> I assume that these are related and that the lack of memory caused
> this problem, would an upgrade to from nagios 1.2 to nagios 1.3 fix
> this? If so, what is the best way to perform that upgrade?
>
> my /etc/xinetd.d/nsca file :
> # default: on
> # description: NSCA
> service nsca
> {
> flags = REUSE
> socket_type = stream
> wait = no
> user = nagios
> group = nagios
> server = /usr/sbin/nsca
> server_args = -c /home/e-smith/nagios/nsca.cfg --inetd
> cps = 9000 30
> instances = UNLIMITED
> log_on_failure += USERID
> disable = no
> only_from = ip1, ip2, ip3, etc..
> }
>
> command_check_interval= -1
>
> System info:
>
> SME server 6.01 (2.4.20-18.7, i686)
>
> Perl v5.6.1
>
> Apache/1.3.27
>
> Nagios 1.2
>
> Any advice would be great… thanks.
>
> Chris
>
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list