Service Alerts and Notifications
wnorth
wnorth at verizon.net
Sat Jan 6 19:13:08 CET 2007
Andy,
I tried setting the retry value to 30s, and it interpreted it as 30 minutes:
max_check_attempts 3
retry_check_interval 30s
normal_check_interval 5
I would have thought the above would set a HARD alert after 1.5 minutes, but
it checked, then scheduled the next check 30 minutes later. Is there a
global setting somewhere that I missed that needs to be changed from minutes
to seconds?
-Wes
-----Original Message-----
From: nagios-users-bounces at lists.sourceforge.net
[mailto:nagios-users-bounces at lists.sourceforge.net] On Behalf Of wnorth
Sent: Friday, January 05, 2007 5:40 PM
To: 'Andy Shellam (Mailing Lists)'
Cc: 'Josh Yost'; nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Service Alerts and Notifications
It does thanks much, it makes perfect sense, I didn't even realize that one
can specify the interval in seconds. I am pretty impressed with nagios as
is, compared to thinks like netcool or topaz it has quite a ways to go, but
for the simple checks, and even advanced scripting it is very powerful.
Thanks again,
-Wes
-----Original Message-----
From: Andy Shellam (Mailing Lists)
[mailto:andy.shellam-lists at mailnetwork.co.uk]
Sent: Friday, January 05, 2007 4:34 PM
To: wnorth
Cc: 'Josh Yost'; nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Service Alerts and Notifications
Don't know if it helps, but on my services, I check services every 5
minutes.
If it fails, it retries 3 times every 30 seconds (so a max. of 1.5
minutes) then it sends me an e-mail/SMS (because it switches to HARD state.)
What this will do...
max_check_attempts 3
retry_check_interval 5
normal_check_interval 5
...is it will check your service every 5 minutes - if it goes off-line,
it will set it to a SOFT fail, wait 5 minutes then check again, if it
still fails (2nd SOFT fail), wait 5 minutes, then check again, then if
it fails a third time, you'll get a HARD fail - so in theory, if the
service is down, you won't find out for 15 minutes.
What might be better is:
max_check_attempts 3
retry_check_interval 1
normal_check_interval 5
This will check your service every 5 minutes - if it fails, it'll re-try
3 times with a 1 minute interval between each, so you'll get notified if
it's still down after 3 minutes.
What you can also do is set retry_check_interval to a seconds interval,
like:
max_check_attempts 3
retry_check_interval 5s
normal_check_interval 5
This tells Nagios to wait 5 seconds between non-OK states, and 5 minutes
between active checks.
You could of course also set "max_check_attempts" to 2 and
"retry_check_interval" to 1 - so the first-time it fails, it waits a
minute then checks again - and if it still fails you get a notification,
so in theory you only get a minute's lag.
hope this random rambling works for you :)
Andy.
wnorth wrote:
> That is actually interesting, when the host goes down I see a HARD service
> alert as follows:
>
> HOST ALERT: ebro;DOWN;HARD;5;CRITICAL - Host Unreachable (10.0.33.8)
>
> But for the check_http I only see the following:
>
> SERVICE ALERT: ebro;Website App Server MS2;CRITICAL;SOFT;3;Connection
> refused
>
> Once I changed the retry interval to 1 and the max attempts to 1 I saw the
> email, so I just wasn't waiting long enough...makes sense. In theory I
would
> want it to try 3 times in a row, if it fails send an email, then wait 5
> minutes and retry again.
>
> For that to work I tried the following:
> max_check_attempts 3
> retry_check_interval 5
> normal_check_interval 5
>
> This should force it to try 3 times before setting a HARD alert and wait 5
> minutes between normal intervals, however that is not what it does, it
> appears it sets the retry_check_interval to 5 minutes between non-OK
service
> alerts, so if I tell it to try 3 times, it will try 3 times and wait
> in-between tries for 5 minutes, if I set it to 2 on the retry it will wait
2
> minutes in between tries, which comes out to a total of 6 minutes. I'd
> rather it fail after a minute or so, so if I set it to 0 it will inherit a
> standard minute...the only way to solve this is to set it at a 1 minute
> interval and just wait.
>
> Sound about right?
>
> -----Original Message-----
> From: Josh Yost [mailto:Josh.Yost at epsiia.com]
> Sent: Friday, January 05, 2007 3:56 PM
> To: wnorth at verizon.net
> Cc: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Service Alerts and Notifications
>
> Hi,
> This is kind of stupid/obvious, but
>
> a) I don't see a HARD service alert in your log snip for the service.
> Did it actually get to that state? Your retry interval is 3 min, so it
> would take you 15 min or so to get an alert.
>
> b) If it did get to HARD, what was the cmd it tried to run & is that a
> valid cmd?
>
> c) Did you kill all the old processes and restart Nagios w/ the new
config?
>
> I don't see anything obvious in your cfgs that wouldn't be working.
>
> - Josh
>
>
> wnorth at verizon.net wrote:
>
>> I have setup a few host and HTTP service checks and alerts. When a host
>>
> goes down I recieve an email, but when the check_http service fails (e.g.
> the TCP port is shutdown on the web server) I see the service alert in the
> nagios.log as follows:
>
>> [1168038639] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;ebro;Website App
>>
> Server MS2;1168038636
>
>> [1168038644] SERVICE ALERT: ebro;Website App Server
>>
> MS2;CRITICAL;SOFT;1;Connection refused
>
>> [1168038824] SERVICE ALERT: ebro;Website App Server
>>
> MS2;CRITICAL;SOFT;2;Connection refused
>
>> [1168039004] SERVICE ALERT: ebro;Website App Server
>>
> MS2;CRITICAL;SOFT;3;Connection refused
>
>> But I do not recieve an email. The following service is defined:
>>
>> define service{
>> host_name ebro
>> service_description Website App Server MS2
>> check_command check_http_fitness_app
>> max_check_attempts 5
>> normal_check_interval 5
>> retry_check_interval 3
>> check_period 24x7
>> contact_groups jboss-admins
>> notification_interval 30
>> notification_period 24x7
>> notification_options w,u,c,r,f
>> }
>>
>> The following contact is setup for the jboss-admins groups:
>>
>> define contactgroup{
>> contactgroup_name jboss-admins
>> alias JBoss Administrators
>> members wnorth
>> }
>>
>> The following contact is setup for wnorth:
>> define contact{
>> contact_name wnorth
>> alias Wes North
>> service_notification_period 24x7
>> host_notification_period 24x7
>> service_notification_options w,u,c,r,f
>> host_notification_options d,u,r,f
>> service_notification_commands notify-by-email
>> host_notification_commands host-notify-by-email
>> email wnorth at verizon.net
>> }
>>
>> If I bring a host offline I see the following alert in the nagios.log:
>>
>> [1168037707] HOST NOTIFICATION:
>>
> wnorth;ebro;DOWN;host-notify-by-email;CRITICAL - Host Unreachable
> (10.0.33.8)
>
>> [1168037767] HOST ALERT: ebro;UP;HARD;1;PING OK - Packet loss = 0%, RTA =
>>
> 0.40 ms
>
>> [1168037767] HOST NOTIFICATION: wnorth;ebro;UP;host-notify-by-email;PING
>>
> OK - Packet loss = 0%, RTA = 0.40 ms
>
>> But if I bring a web service offline it fails to email me. I don't know
>>
> why, I have specified everything correctly. Any insight would be much
> appreciated.
>
>> -Wes
>>
>>
>> -------------------------------------------------------------------------
>> Take Surveys. Earn Cash. Influence the Future of IT
>> Join SourceForge.net's Techsay panel and you'll get the chance to share
>>
> your
>
>> opinions on IT & business topics through brief surveys - and earn cash
>> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when
>>
> reporting any issue.
>
>> ::: Messages without supporting info will risk being sent to /dev/null
>>
>
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
> !DSPAM:37,459eeb74137101726516177!
>
>
>
--
Andy Shellam
NetServe Support Team
the Mail Network
"an alternative in a standardised world"
p: +44 (0) 121 288 0832/0839
m: +44 (0) 7818 000834
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue.
::: Messages without supporting info will risk being sent to /dev/null
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list