Problems with initial install of Nagios
Sean R. Clark
sclark at nyroc.rr.com
Mon Aug 16 16:01:35 CEST 2004
I was tasked with converting our Big Brother over to nagios
I am running nagios v 1.2 on gentoo with 670 hosts in the hosts.cfg
Currently I am only running check_ping on them to see how well it scales
Right now, I have 57 Down 0 Unreachable 55 Up 558 Pending
The "pending list" seems to go down at the rate of 1 host per hour, making
it seem like the testing is very serial in it's nature
The hosts them selves say things like "Service check scheduled for Mon Aug
16 09:16:38 2004 " but it's 9:54 and still no check.
I tried using parallelization, setting max_concurrent_checks to 0, this did
not make the list go down at all
I set max_concurrent_checks to 700, and this didn't not help either
Here are my timeout values:
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
Nagios -s gives me
SERVICE SCHEDULING INFORMATION
-------------------------------
Total services: 672
Total hosts: 670
Command check interval: -1 sec
Check reaper interval: 5 sec
Inter-check delay method: SMART
Average check interval: 61.607 sec
Inter-check delay: 0.092 sec
Interleave factor method: SMART
Average services per host: 1.003
Service interleave factor: 2
Initial service check scheduling info:
--------------------------------------
First scheduled check: 1092664412 -> Mon Aug 16 09:53:32 2004
Last scheduled check: 1092664473 -> Mon Aug 16 09:54:33 2004
Rough guidelines for max_concurrent_checks value:
-------------------------------------------------
Absolute minimum value: 55
Recommend value: 165
All the hosts fall under this service
define service {
use generic-service
host_name *
service_description PING
contact_groups rdc-staff
check_period 24x7
notification_interval 480
notification_options w,u,c,r
notification_period 24x7
check_command check_ping!100.0,20%!500.0,60%
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
}
I have fping installed also, which is what I was using with Big Brother, and
that took at most 300 seconds to give me the status for all the hosts, on
the same hardware.
The box does not seem taxed at all, either
top - 09:56:56 up 18 days, 19:00, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 47 total, 1 running, 46 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.2% user, 0.4% system, 0.0% nice, 98.4% idle
Mem: 1292264k total, 687864k used, 604400k free, 288856k buffers
Swap: 999576k total, 8016k used, 991560k free, 150448k cached
And it seems like only one ping process is running
ps aux | grep ping
nagios 12453 0.1 0.0 1688 712 ? S 09:57 0:00
/usr/nagios/libexec/check_ping -H 172.16.20.139 -w 3000.0,80% -c 5000.0,100%
-p 1
nagios 12454 0.1 0.0 1840 656 ? S 09:57 0:00 /bin/ping -n
-U -c 1 172.16.20.139
Sorry for the lengthy first post, but I am at wits end with this. Also, even
though the plug-in asked me where fping was, it seems it's using just ping
to do the pings. Can anyone point me in the right direction?
-Sean
-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list