Ways and tweaks to make nagios more efficient. load average on monitoring host edging up.
Mathieu Gagné
mgagne at iweb.com
Wed Jan 28 01:04:05 CET 2009
Hi,
Rahul Nabar wrote:
> I set up my nagios system to monitor 256 odd nodes each with about 6
> services (direct and NRPE). It is working fine but my load averages have
> started edging upwards. Not critical yet but I wanted some tips to make
> things more efficient and see if there are things I might have done
> ineffeciently.
We have +2000 hosts and +4700 services configured on one of our Nagios
instance. Load average is between 1.3 an 2.0 which I find acceptable.
Our hardware is the following: Core2 Duo 4300 @ 1.80GHz with 2GB of RAM.
> One of the points I identified is this: I am doing a ping and ssh check
> on each server. This seems redundant. Is there a way to set it up so that:
> Do a ssh check; if this succeds obviously ping is ok. If it fails do a
> ping check and report on that.
"check-host-alive" is only triggered when a service associated with the
host changes state.
However, I personally consider PING to be a service in itself,
monitoring the network performance/quality.
PING can still answer but with degraded performances (packet loss, poor
response time). You probably want to be informed about such problems.
(ie. in case of a (D)DoS where your network port is maxed out)
> How about the other way around too? I have a bunch of NRPE checks:
> load_average, total-processes, scratch and home dir usage, pbs_mom,
> ntp_time. If ssh fails then there is obviously no reason to try these
> other checks right? But I think the monitoring_host wastes its cycles
> still trying them (based on the "Last Check" time)
The SSH service state can be CRITICAL while all the other services are
still OK. (ie. ssh server misconfiguration) You probably want to be
informed about it too.
> Any tips how I can achieve these effeciency tweaks? Or is there a
> problem in my strategy? Any other performance tweaks so that I can
> squeeze every ounce of Nagios performace?
>
> Already I am using NRPE rather than check_by_sshh since I was told the
> latter might be ineffecient for the monitoring host load usage.
What kind of server are you using?
Also, what's the check_interval? A 1 minute interval might put the
server on its knee since it would be scheduling and executing 1536
checks per minute. (as per your informations)
There's a lot of factors that could impact Nagios performance and you
should be aware of all of them. Reading the documentation and
understanding the impact of each configuration would be a good start.
--
Mathieu
------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list