Service Alerts and Notifications
wnorth
wnorth at verizon.net
Sun Jan 7 01:34:26 CET 2007
Some of the checks the check_http will be fine, since it is hitting a page
that accesses the DB layer, so it's a full check. Once we start talking
synthetic transaction monitoring, (e.g. trying to monitor what an end-user
is seeing in terms of query response times, page latency during a work-flow
process etc.) it kind of falls short. But for normal systems monitoring and
doing basic health checks, it is fantastic. You have to ask yourself, what
does one expect with a tool that is designed to be portable and modulated, a
bit of up-front leg work to get the desired outcome should be expected.
Tools like Mercury Topaz/Sitescope, Gomez or Micromuse Netcool allow you to
actually script out an end-to-end work flow process then measure performance
against say an SLA. For us we are looking at 2 tools, one for simple systems
and network management, which Nagios does an excellent job, and two
synthetic transaction monitoring, which one can still use Nagios, but has to
build a custom work-flow tool (e.g. script it out in Perl or build a simple
program that nagios can execute).
-Wes
-----Original Message-----
From: Andy Shellam (Mailing Lists)
[mailto:andy.shellam-lists at mailnetwork.co.uk]
Sent: Saturday, January 06, 2007 3:48 PM
To: wnorth
Cc: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Service Alerts and Notifications
Thanks for the info Wes,
As it appears to work OK I'll try it out on my new box before it goes
live, I was a bit cautious with all the warnings about un-tested
consequences etc.
Can you not do your web request with the check_http plugin, or does it
require something a bit more complicated?
Andy.
wnorth wrote:
> Thanks Patrick that fixed it.
>
> I had to change the interval_length from 60 to 1 which equates to seconds
> now instead of minutes I have to specify all my values in seconds, as
> follows:
>
> max_check_attempts 3
> retry_check_interval 30s
> normal_check_interval 300
>
>
> This causes it to check every 5 minutes, if it receives a non-OK message
> back it will try 3 times before marking it HARD down, each interval check
is
> set to 30 seconds, so 1.5 minutes before an alert is sent out.
>
> Also, I had to specify the notification_interval, which was set to 30
> minutes by default, to 1800 which equates to 30 minutes. I think that
value
> should probably be equal to if not greater by 2X the normal check
interval,
> at least that's what I think. This way each time it checks if the
condition
> is still bad it will notify, then again you don't want to get flooded, so
> perhaps every other time it checks send an email? It's really up to me to
> decide that.
>
> Oh well, onto the next task, trying to see if I can build some sort of
> transaction based monitor, which will hit a home page, navigate to a
> specific screen and execute a web query. This is where something like
Gomez,
> Mercury or Netcool would be great at, with Nagios...have to think a bit
more
> out of the box, besides its free right? And from the last 1-2 days I've
> spent on it, very powerful...starting to like it more and more. ;-)
>
> -Wes
>
> -----Original Message-----
> From: nagios-users-bounces at lists.sourceforge.net
> [mailto:nagios-users-bounces at lists.sourceforge.net] On Behalf Of Morris,
> Patrick
> Sent: Saturday, January 06, 2007 10:26 AM
> To: Andy Shellam (Mailing Lists)
> Cc: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Service Alerts and Notifications
>
>
>> I tried setting the retry value to 30s, and it interpreted it
>> as 30 minutes:
>>
>> max_check_attempts 3
>> retry_check_interval 30s
>> normal_check_interval 5
>>
>> I would have thought the above would set a HARD alert after
>> 1.5 minutes, but it checked, then scheduled the next check 30
>> minutes later. Is there a global setting somewhere that I
>> missed that needs to be changed from minutes to seconds?
>>
>
> Check your interval_length in nagios.cfg. That determines how long a
> single unit it.
>
> If "30" gives you 30 minutes, then it's probably set to 60 (60 seconds/1
> minute). Sticking an "s" on your retry interval isn't going to change
> anything; it'll just be ignored.
>
> You'll need to adjust your interval if you want things to happen in
> under 1 time unit (in the default case, 60 seconds).
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>
>
> !DSPAM:37,45a033fa137101071920780!
>
>
>
--
Andy Shellam
NetServe Support Team
the Mail Network
"an alternative in a standardised world"
p: +44 (0) 121 288 0832/0839
m: +44 (0) 7818 000834
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list