Optimal Config for hundreds of passive checks && Hundreds of Nagios procs

Jason Lancaster jlancaster at affinity.com
Thu Jun 19 21:16:39 CEST 2003


With regards to this thread and the thread titled "Hundreds of Nagios
procs," I thought I'd share the configuration file I use in my
implementation. This (complete) config is similar on all systems, with
various tweaks for each one. The monitoring servers are all at least 1ghz
machines with around 3000 services. Every server has a ramdisk and all
monitoring servers run a custom "ocsp sweeper" application to send nsca
stats in bulk to the central server. This lightened the load on monitoring
servers quite a bit as each ocsp command takes execution time.

It seems like there are a lot of threads on this mailing the list right now
asking about why implementations of Nagios have a huge queue of results to
process. You can fix it... it just needs to be tweaked.

Let me know if you have any questions.

-Jason

log_file=/usr/local/nagios/var/nagios.log
cfg_file=/usr/local/nagios/etc/checkcommands.cfg
cfg_file=/usr/local/nagios/etc/misccommands.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/dependencies.cfg
cfg_file=/usr/local/nagios/etc/escalations.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
resource_file=/usr/local/nagios/etc/resource.cfg
status_file=/usr/local/nagios/ramdisk/status.log
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
command_file=/usr/local/nagios/ramdisk/nagios.cmd
comment_file=/usr/local/nagios/var/comment.log
downtime_file=/usr/local/nagios/var/downtime.log
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/ramdisk/nagios.tmp
log_rotation_method=d
log_archive_path=/usr/local/nagios/var/archives
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=1
log_external_commands=1
log_passive_service_checks=1
inter_check_delay_method=s
service_interleave_factor=s
max_concurrent_checks=600
service_reaper_frequency=1
sleep_time=1
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=30
perfdata_timeout=5
retain_state_information=1
state_retention_file=/usr/local/nagios/var/status.sav
retention_update_interval=0
use_retained_program_state=0
interval_length=60
use_agressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=0
obsess_over_services=0
check_for_orphaned_services=0
check_service_freshness=1
freshness_check_interval=1200
aggregate_status_updates=1
status_update_interval=5
enable_flap_detection=0
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=us
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
admin_email=nagios
admin_pager=pagenagios


----- Original Message ----- 
From: "solo molo" <solomolo90 at hotmail.com>
To: <nagios-users at lists.sourceforge.net>
Sent: Wednesday, June 18, 2003 18:41
Subject: [Nagios-users] Optimal Config for hundreds of passive checks


> I have nagios running on redhat 8.0 on a compaq DL360 with dual 800mhz
procs
> and 1GB ram.  Nagios receives 400 passive check results every 10 minutes
and
> another 100+ active checks are perfomed every 5 minutes.  My loads are
never
> very high, but nagios gets way behind on processing the passive checks.
The
> problem is especially bad when some of the passive checks return critical
> results.  I've seen the delay as bad as 20 hours.  That is when I check
the
> log, nagios is receiving current passive results, but displaying results
> from 20 hours ago in the UI.  I'd appreciate any suggestion as to how I
can
> configure nagios to process the passive results more quickly.  I'm using
the
> following config:
>
> inter_check_delay_method=d #I can't use smart because I have a few checks
> that only run once every 24 hours and throw off the average.
>
> service_interleave_factor=s
> max_concurrent_checks=0
> service_reaper_frequency=5
> sleep_time=1
>
> _________________________________________________________________
> Protect your PC - get McAfee.com VirusScan Online
> http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: INetU
> Attention Web Developers & Consultants: Become An INetU Hosting Partner.
> Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
> INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list