Host and Service checks (was: Fail error message - more interesting)
Rob Nelson
rob at capband.net
Tue Jun 24 15:27:29 CEST 2003
>Look at the service/host check logic carefully. If you have a service,
>there must be a host check behind it. Conversely, (IIRC) if you have a
>host check, it will always be in an assumed up state until a service
>check goes down, triggering the host check.
>
>Long and short of it is, each host needs both a host check and one or
>more service checks, otherwise you will find you have hosts that do not
>come back up even after the service check has been fixed.
Can you expand on this some? I think I understand what you're saying, but I
want to be clear. Let's assume the following checks on "node1"
from hosts.cfg:
===========
define host{
use generic-host
host_name node1.sitename.com
alias Node 1
address 10.10.12.1
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}
===========
from services.cfg:
============
define service{
use generic-service
host_name node1.sitename.com
service_description ping
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 2
contact_groups contact-group1
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
check_command check_ping!1000.0,40%!2000.0,80%
}
============
And of course, from checkcommands.cfg:
=============
# 'check_ping' command definition
define command{
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c
$ARG2$ -p 10
}
# 'check-host-alive' command definition
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80%
-c 5000.0,100% -p 5 -t 30
}
=============
I modified check-host-alive because we're monitoring wireless ap's (which
drop icmp packets priority if they get even slightly busy) over a VPN. If I
leave it at default, often times the first icmp packet from a host gets
dropped.
Could you or someone else elaborate on what exactly Nagios will do when it
turns on, as in a timeline of host and service checks? Assume the host is
up initially, drops to critical after an hour, rises to warning 30 minutes
after that, and 30 minutes after that (2 hours from start), goes back to
normal. I think I had some misconceptions about when host/service checks
were performed but I just want to make sure I don't read you wrong and pick
up more misconceptions :)
Rob Nelson
Network Administrator, Capitol Broadband
C: 919-369-1874
rob at capband.net
-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list