Service Alerts and Notifications
wnorth
wnorth at verizon.net
Sat Jan 6 01:19:23 CET 2007
That is actually interesting, when the host goes down I see a HARD service
alert as follows:
HOST ALERT: ebro;DOWN;HARD;5;CRITICAL - Host Unreachable (10.0.33.8)
But for the check_http I only see the following:
SERVICE ALERT: ebro;Website App Server MS2;CRITICAL;SOFT;3;Connection
refused
Once I changed the retry interval to 1 and the max attempts to 1 I saw the
email, so I just wasn't waiting long enough...makes sense. In theory I would
want it to try 3 times in a row, if it fails send an email, then wait 5
minutes and retry again.
For that to work I tried the following:
max_check_attempts 3
retry_check_interval 5
normal_check_interval 5
This should force it to try 3 times before setting a HARD alert and wait 5
minutes between normal intervals, however that is not what it does, it
appears it sets the retry_check_interval to 5 minutes between non-OK service
alerts, so if I tell it to try 3 times, it will try 3 times and wait
in-between tries for 5 minutes, if I set it to 2 on the retry it will wait 2
minutes in between tries, which comes out to a total of 6 minutes. I'd
rather it fail after a minute or so, so if I set it to 0 it will inherit a
standard minute...the only way to solve this is to set it at a 1 minute
interval and just wait.
Sound about right?
-----Original Message-----
From: Josh Yost [mailto:Josh.Yost at epsiia.com]
Sent: Friday, January 05, 2007 3:56 PM
To: wnorth at verizon.net
Cc: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Service Alerts and Notifications
Hi,
This is kind of stupid/obvious, but
a) I don't see a HARD service alert in your log snip for the service.
Did it actually get to that state? Your retry interval is 3 min, so it
would take you 15 min or so to get an alert.
b) If it did get to HARD, what was the cmd it tried to run & is that a
valid cmd?
c) Did you kill all the old processes and restart Nagios w/ the new config?
I don't see anything obvious in your cfgs that wouldn't be working.
- Josh
wnorth at verizon.net wrote:
> I have setup a few host and HTTP service checks and alerts. When a host
goes down I recieve an email, but when the check_http service fails (e.g.
the TCP port is shutdown on the web server) I see the service alert in the
nagios.log as follows:
>
> [1168038639] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;ebro;Website App
Server MS2;1168038636
> [1168038644] SERVICE ALERT: ebro;Website App Server
MS2;CRITICAL;SOFT;1;Connection refused
> [1168038824] SERVICE ALERT: ebro;Website App Server
MS2;CRITICAL;SOFT;2;Connection refused
> [1168039004] SERVICE ALERT: ebro;Website App Server
MS2;CRITICAL;SOFT;3;Connection refused
>
> But I do not recieve an email. The following service is defined:
>
> define service{
> host_name ebro
> service_description Website App Server MS2
> check_command check_http_fitness_app
> max_check_attempts 5
> normal_check_interval 5
> retry_check_interval 3
> check_period 24x7
> contact_groups jboss-admins
> notification_interval 30
> notification_period 24x7
> notification_options w,u,c,r,f
> }
>
> The following contact is setup for the jboss-admins groups:
>
> define contactgroup{
> contactgroup_name jboss-admins
> alias JBoss Administrators
> members wnorth
> }
>
> The following contact is setup for wnorth:
> define contact{
> contact_name wnorth
> alias Wes North
> service_notification_period 24x7
> host_notification_period 24x7
> service_notification_options w,u,c,r,f
> host_notification_options d,u,r,f
> service_notification_commands notify-by-email
> host_notification_commands host-notify-by-email
> email wnorth at verizon.net
> }
>
> If I bring a host offline I see the following alert in the nagios.log:
>
> [1168037707] HOST NOTIFICATION:
wnorth;ebro;DOWN;host-notify-by-email;CRITICAL - Host Unreachable
(10.0.33.8)
> [1168037767] HOST ALERT: ebro;UP;HARD;1;PING OK - Packet loss = 0%, RTA =
0.40 ms
> [1168037767] HOST NOTIFICATION: wnorth;ebro;UP;host-notify-by-email;PING
OK - Packet loss = 0%, RTA = 0.40 ms
>
> But if I bring a web service offline it fails to email me. I don't know
why, I have specified everything correctly. Any insight would be much
appreciated.
>
> -Wes
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list