Nagios 3.0.4 performance issue
Alloo, Vincent
v-alloo at ti.com
Thu Nov 20 11:37:19 CET 2008
Adreas,
I have changed a little bit my configuration, and I can confirm the CPU load is NOT coming from servicedependency but only from the big servicegroup definition.
I have removed from the conf the servicedependency definition, keeping only the servicegroup definition and association, and my CPU load is huge. By removing the servicegroup, the CPU is back to normal.
Regards,
Vincent Alloo
TI France Design Systems Operations Manager
Europe and Middle East IT Services
Texas Instruments France
E-Mail: v-alloo at ti.com
Phone: +33 4 93 22 26 97
Mobile: +33 6 82 13 00 80
-----Original Message-----
From: Andreas Ericsson [mailto:ae at op5.se]
Sent: Thursday, November 20, 2008 10:44 AM
To: Alloo, Vincent
Cc: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Nagios 3.0.4 performance issue
Alloo, Vincent wrote:
> Andreas,
> Here is an extract of my setup:
>
> define servicegroup{
> servicegroup_name nrpe_services
> alias NRPE Services
> }
>
> define servicedependency{
> host_name svxnagios02
> service_description check_uname
> dependent_servicegroup_name nrpe_services
> notification_failure_criteria w,u,c
> }
>
> define service {
> use unix_24_7
> host_name svxnagios02
> service_description check_uname
> check_command check_nrpe_ssl!uname!0
> notification_options c,r
> process_perf_data 0
> }
>
> And a bunch of:
> define service {
> use unix_24_7
> hostgroup_name sol-servers,linux-servers,sol-zone-servers,sol-servers-with_hotspare
> service_description CPU load
> check_command check_nrpe_ssl!check_load!5,4,3!6,5,4
> servicegroups nrpe_services
> }
> .....(3600 services within the nrpe_services service group)
>
Oh. Are you proxying all your NRPE checks through some other system? I
can't imagine why this would be a good idea, but to each his own, I suppose.
With this configuration, each of the 3600 services should each depend on
exactly one other service, so the problem I initially foresaw is not in place.
However, like Sascha mentioned, Nagios instead seems to run that extra check
before any of the other 3600 service checks.
I'll need to run some manual testing on this. Since you've only specified
"notification_failure_criteria", Nagios should be able to avoid checking
the service being depended on until it's trying to send a notification. In
fact, it should probably switch the checking order around so that the service
being depended upon is checked *after* the dependent service. That would
solve your problem until NRPE starts failing. After that, there's no help
for it, but then you should definitely see some service check cache hits
which will at least make the load on the system bearable. I'll try to find
some time to look into this next week at the latest.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list