Problems with initial install of Nagios

Lawrence, Lynne LLawrence at osc.uscg.mil
Mon Aug 16 18:07:17 CEST 2004


Sean,

I had a similar problem myself (warning: I am new to nagios myself so you
might want to take what I have to say with a grain of salt).  I noticed the
following when I looked at what was going on:

- the check_ping check sends, by default 5 packets with a timeout of 10
seconds each.  This could cause check_ping to take up to 50 seconds to fail.
- when I checked the processes running with the ps command, I typically saw
either several service-level checks going on, or _one_ host check going on,
which makes me wonder whether host checks are not run in parallel?  Note
that in your output, below, that looks like a host check command due to the
check_ping parameters not matching your service def.

My suspicion is that, when there are lots of hosts that are in failure mode,
the scheduling gets out of whack because of the relatively long running host
check commands that appear to run serially (maybe someone more aware of the
ins and outs of scheduling could confirm).

Anyway, as my goal with check_ping is to verify, pass/fail, whether my
systems are up, and all of my systems are on what should be a performant
intranet, I modified my check_ping service to send only 3 packets with a
three second timeout with alarm state only in case of 100% packet loss,
max_check_attempts = 1.  Likewise the host check command.  This can be done
using the -p and -t options to the check_ping plugin.

This helped me get my system working - maybe it will help you.

Regards,

Lynne Lawrence
QSS/USCG

> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net
> [mailto:nagios-users-admin at lists.sourceforge.net]On Behalf Of Sean R.
> Clark
> Sent: Monday, August 16, 2004 10:02 AM
> To: Nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] Problems with initial install of Nagios
> 
> 
> 
> 
> I was tasked with converting our Big Brother over to nagios
> 
> I am running nagios v 1.2 on gentoo with 670 hosts in the hosts.cfg
> 
> Currently I am only running check_ping on them to see how 
> well it scales
> 
> 
> Right now, I have 57 Down 	0 Unreachable 	55 Up 	558 Pending
> 
> The "pending list" seems to go down at the rate of 1 host per 
> hour, making
> it seem like the testing is very serial in it's nature
> 
> The hosts them selves say things like "Service check 
> scheduled for Mon Aug
> 16 09:16:38 2004 " but it's 9:54 and still no check.
> 
> I tried using  parallelization, setting max_concurrent_checks 
> to 0, this did
> not make the list go down at all
> 
> I set max_concurrent_checks to 700, and this didn't not help either
> 
> Here are my timeout values: 
> 
> service_check_timeout=60
> host_check_timeout=30
> event_handler_timeout=30
> notification_timeout=30
> ocsp_timeout=5
> perfdata_timeout=5
> 
> 
> Nagios -s gives me
> 
>         SERVICE SCHEDULING INFORMATION
>         -------------------------------
>         Total services:             672
>         Total hosts:                670
> 
>         Command check interval:     -1 sec
>         Check reaper interval:      5 sec
> 
>         Inter-check delay method:   SMART
>         Average check interval:     61.607 sec
>         Inter-check delay:          0.092 sec
> 
>         Interleave factor method:   SMART
>         Average services per host:  1.003
>         Service interleave factor:  2
> 
>         Initial service check scheduling info:
>         --------------------------------------
>         First scheduled check:      1092664412 -> Mon Aug 16 
> 09:53:32 2004
>         Last scheduled check:       1092664473 -> Mon Aug 16 
> 09:54:33 2004
> 
>         Rough guidelines for max_concurrent_checks value:
>         -------------------------------------------------
>         Absolute minimum value:     55
>         Recommend value:            165
> 
> 
> 
> All the hosts fall under this service
> 
> define service {
>     use    generic-service
>     host_name    *
>     service_description    PING
>     contact_groups    rdc-staff
>     check_period    24x7
>     notification_interval    480
>     notification_options    w,u,c,r
>     notification_period    24x7
>     check_command    check_ping!100.0,20%!500.0,60%
>     max_check_attempts    1
>     normal_check_interval    1
>     retry_check_interval    1
> }
> 
> 
> I have fping installed also, which is what I was using with 
> Big Brother, and
> that took at most 300 seconds to give me the status for all 
> the hosts, on
> the same hardware.
> 
> The box does not seem taxed at all, either
> 
> top - 09:56:56 up 18 days, 19:00,  1 user,  load average: 
> 0.00, 0.00, 0.00
> Tasks:  47 total,   1 running,  46 sleeping,   0 stopped,   0 zombie
> Cpu(s):   1.2% user,   0.4% system,   0.0% nice,  98.4% idle
> Mem:   1292264k total,   687864k used,   604400k free,   
> 288856k buffers
> Swap:   999576k total,     8016k used,   991560k free,   
> 150448k cached
> 
> 
> 
> 
> And it seems like only one ping process is running
> 
>  ps aux  | grep ping
> nagios   12453  0.1  0.0  1688  712 ?        S    09:57   0:00
> /usr/nagios/libexec/check_ping -H 172.16.20.139 -w 3000.0,80% 
> -c 5000.0,100%
> -p 1
> nagios   12454  0.1  0.0  1840  656 ?        S    09:57   
> 0:00 /bin/ping -n
> -U -c 1 172.16.20.139
> 
> 
> 
> Sorry for the lengthy first post, but I am at wits end with 
> this. Also, even
> though the plug-in asked me where fping was, it seems it's 
> using just ping
> to do the pings. Can anyone point me in the right direction?
> 
> 
> 
> -Sean
> 
> 
> 
> 
> -------------------------------------------------------
> SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
> 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
> Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
> http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS 
> when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list