decreasing sensitivity of host (down) checks?
Frost, Mark {PBG}
mark.frost1 at pepsi.com
Tue May 15 19:10:08 CEST 2007
> I feel like this is a dumb question, but I've got to ask it anyway :-)
>
> Our host checks are done as needed by Nagios which I guess is the
> common way which doesn't hit us hard performance-wise. I use
> check_fping.
>
> Recently, some of the teams who get the alerts have asked if they
> could not get host UP/DOWN alerts if the boxes are down for less than
> 10 minutes. (These are windows boxes being rebooted). They've
> indicated that they don't care about a box being rebooted, but they
> would care if the box went down and stayed down for longer than 10
> minutes.
>
> We already do this kind of thing (setting a minumum threshold at which
> we want to be bothered) for service checks which is most of what we
> do, but this seems more problematic with host checks.
>
> For my host checks I have the following defined:
>
> notifications_enabled 1
> event_handler_enabled 1
> flap_detection_enabled 1
> process_perf_data 1
> retain_status_information 1
> retain_nonstatus_information 1
> check_command check-host-alive
> check_interval 0
> check_freshness 0
> max_check_attempts 10
> notification_interval 0
> notification_period 24x7
> notification_options d,u,r
>
> So with the max_check_attempts set at 10 I can see that Nagios will
> try 10 successive pings of this host before it wants to send an alert.
> Looking at the history for downed hosts, it looks like it reruns this
> check once per second 10 times. The check_interval being set at 0
> causes the checks to be performed only on demand.
>
> I could bump up the max_check_attempts to something like 600 (10
> minutes of successive 1-second pings), but I imagine that's not too
> good from a performance perspective either.
>
> I'm not really sure what I could do here to leave the checks as "on
> demand", but yet not consider sending out an alert unless its been
> down for more than 10 minutes. Am I right about killing my
> performance if I crank up the max_check_attempts value here to 600?
>
> Thanks
>
> Mark
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list