Pinging every host at a specific interval?
Marc Powell
marc at ena.com
Fri Jan 27 20:04:58 CET 2006
> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of Ryan Whalen
> Sent: Friday, January 27, 2006 12:39 PM
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Pinging every host at a specific interval?
> {Disarmed} {Fraud?}
>
> Here is an example of a Host. I have ~40-50 of these one after
another in
> the minimal.cfg file.
>
> define host{
> use nn-ping ; Name of host
template
> to use
> host_name Ryan Whalen
> alias Ryan Whalen
> address 192.168.1.15
> check_command check-host-alive
> max_check_attempts 10
I'd recommend changing this to 1 and verify that check_host_alive only
sends 1 or 2 pings. You really don't need more than that under normal
circumstances to determine if the host is down. Adjust that if you have
a lossy network that you can't fix for some reason.
Also, now that I re-read your original mail it might be a
misunderstanding about how nagios operates. You say --
"What I would like to happen is for Nagios to ping them all every 5
minutes, and display their status. However, some hosts have not been
polled for over 2 hours."
Nagios only checks the status of a _host_ IFF a service on that host
returns a non-OK state. The logic is that if a service is OK then the
host must be OK so why check it? Are the delayed checks that you are
seeing truly host checks or service checks as well?
> Here is the service template:
>
> define service{
> name nn-service ; The 'name' of
this
> service template
> active_checks_enabled 1 ; Active service
checks
> are enabled
> passive_checks_enabled 1 ; Passive service
checks
> are enabled/accepted
> parallelize_check 1 ; Active service
checks
> should be parallelized (disabling this can lead t
> obsess_over_service 1 ; We should obsess
over
> this service (if necessary)
What is your OCSP command doing? Could that be causing a delay?
> process_perf_data 1 ; Process performance
data
Same question here. How are you processing the perfdata?
> define service{
> use nn-service ; Name of
> service template to use
> host_name Ryan Whalen,(other hosts go
here)
> service_description PING
> is_volatile 0
> check_period 24x7
> max_check_attempts 4
> normal_check_interval 5
> retry_check_interval 1
> contact_groups admins
> notification_interval 960
> notification_period 24x7
> check_command check_ping!100.0,20%!500.0,60%
> }
Looks good. What do you use for your timeouts in nagios.cfg? We use --
service_check_timeout=45
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
In your ps list do you see any orphaned check commands or nagios
processes that have been hanging out for a while? What does the
scheduling queue show for the services that haven't been checked in a
while.
--
Marc
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list