service notification when host is down
Morris, Patrick
patrick.morris at hp.com
Wed Feb 17 17:52:53 CET 2010
Samuel Bancal wrote:
> Nagios Core 3.2.0
> nagios-plugins-1.4.14
> Ubuntu server 8.04.3 LTS
>
> Hi,
>
> I'm encountering problems to configure the notifications in case a
> server is no more responding to PING (ICMP).
> I don't understand why Nagios is jumping over steps when it's doing
> service-check "icmp".
> Here is the config :
>
> define host{
> use generic-server
> host_name server1
> alias server1
> address the.ip.the.ip
> hostgroups prod-servers
> contact_groups group1
> check_command check-host-alive
> check_period 24x7
> check_interval 5
> retry_interval 1
> max_check_attempts 4
> notification_period 24x7
> notification_interval 60
> notification_options d,u,r
> }
>
> define service{
> use generic-service
> host_name server1
> service_description ICMP
> check_command check_icmp!100.0,20%!500.0,60%
> max_check_attempts 4
> normal_check_interval 5
> retry_check_interval 1
> notification_options w,u,c,r
> notification_interval 60
> notification_period 24x7
> }
> [...]
> define command{
> command_name check-host-alive
> command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c
> 5000.0,100% -p 5
> }
> define command{
> command_name check_icmp
> command_line $USER1$/check_icmp -H $HOSTADDRESS$ -w $ARG1$ -c
> $ARG2$ -p 5
> }
> [...]
>
> Here is an example of history that I get :
> Service Critical[2010-02-16 11:33:13] SERVICE ALERT:
> server1;ICMP;CRITICAL;SOFT;1;CRITICAL - the.ip.the.ip: rta nan, lost 100%
> Host Down[2010-02-16 11:33:43] HOST ALERT: server1;DOWN;SOFT;1;(Host
> Check Timed Out)
> Service Critical[2010-02-16 11:34:13] SERVICE ALERT:
> server1;ICMP;CRITICAL;HARD;1;CRITICAL - the.ip.the.ip: rta nan, lost 100%
> Host Down[2010-02-16 11:34:43] HOST ALERT: server1;DOWN;SOFT;2;(Host
> Check Timed Out)
> Host Down[2010-02-16 11:35:23] HOST ALERT: server1;DOWN;SOFT;3;(Host
> Check Timed Out)
> Host Down[2010-02-16 11:36:33] HOST ALERT: server1;DOWN;HARD;4;(Host
> Check Timed Out)
> Host Up[2010-02-16 11:37:43] HOST ALERT: server1;UP;HARD;1;PING OK -
> Packet loss = 0%, RTA = 0.67 ms
> Service Ok[2010-02-16 11:39:13] SERVICE ALERT:
> server1;ICMP;OK;HARD;1;OK - the.ip.the.ip: rta 0.943ms, lost 0%
>
> Or later :
> Host Down[2010-02-16 11:42:03] HOST ALERT: server1;DOWN;SOFT;1;(Host
> Check Timed Out)
> Host Down[2010-02-16 11:43:13] HOST ALERT: server1;DOWN;SOFT;2;(Host
> Check Timed Out)
> Service Critical[2010-02-16 11:44:13] SERVICE ALERT:
> server1;ICMP;CRITICAL;HARD;1;CRITICAL - the.ip.the.ip: rta nan, lost 100%
> Host Down[2010-02-16 11:44:43] HOST ALERT: server1;DOWN;SOFT;3;(Host
> Check Timed Out)
> Host Up[2010-02-16 11:45:53] HOST ALERT: server1;UP;SOFT;4;PING OK -
> Packet loss = 0%, RTA = 0.64 ms
> Service Ok[2010-02-16 11:49:13] SERVICE ALERT:
> server1;ICMP;OK;HARD;1;OK - the.ip.the.ip: rta 0.948ms, lost 0%
If you're asking why Nagios runs a host check when it sees the service
fail a check, that's normal behavior.
When a service check fails, the first thing Nagios will do is look to
see if the service failed because the host is down.
------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list