Request new functionality: "Off Hours" state.
Andreas Ericsson
ae at op5.se
Thu Feb 15 17:05:53 CET 2007
Larry P. Schrof wrote:
> Hello,
>
> I work for a trading firm, and upper management has begun to have a
> solid appreciation for Nagios and what it can do. However, we have a few
> requirements in a monitoring solution that I would love to see
> added to Nagios, as I think they would be useful to the community at large.
> I'll put forth one idea per email.
>
> Request for new functionality:
> ------------------------------
>
> Right now, for the sake of discussion, let's assume a service is
> checked from 8:00am to 4:00pm every five minutes. Assume, at 3:57pm
> that a service is in the critical state.
>
> The problem our company has is this: The Nagios CGI's will continue to
> report the service in a critical state from 3:57pm THROUGH the "off
> hours" until the next morning. This "pollutes" the displays with a red
> critical entry that we don't want to see. Manually submitting (or
> scripting) a passive check just after 4pm to set the service back to
> an 'Ok' is unacceptable, as our folks want to know, at a quick glance,
> what "should be currently monitored" and "what shouldn't be."
>
> What we need / would like is a per-service and per-host configuration
> option that allows a host or service to enter an "Off hours" state in
> the CGI displays. (Or perhaps there should also be a global option for this?)
>
> It would be nice if performance data would not be gathered during this
> state. (Perhaps that's the way it works now - haven't checked.)
>
> I am even envisioning a new color for the service / host entries in
> the CGI's - perhaps blue. This color would readily allow folks to identify
> entries that are in "off hours", as opposed to processes that are
> being monitored and in an 'ok' state.
>
> I do realize that many folks do want to know / see the last state of
> their services before the time_period expired, but in our case, it is
> important that we explicitly have the last known state wiped from the
> CGI's once the time_period has expired. Our service response team
> doesn't want to have 40+ red, yellow, or orange entries showing up for
> hosts that aren't even currently in their active time_period.
>
Why not just disable notifications for the host/service during that period
and use the time period for when notifications are enabled when you grab
your SLA's?
Or script disabling/enabling of the checks in question and use the filter-
functions to only see things which are being monitored?
Or creating a host or service-group with just the interesting checks to
watch for the emergency team.
Seeing as it's all about filtering, you'd be better off letting the computer
do it for you than making it easy for your staff to do it manually.
> Maybe it would just take a host / service config entry such as
> 'display_off_hours_state' ?
>
> Can folks who are intimately familiar with the source code let me know
> how feasible this potentially is?
>
Until someone with competence and incentive to code it up steps forward,
the feasibility is zero. Someone might hack it up as a cute feature but
since this is the first time a wish like this has been aired on the list
I wouldn't hold my hopes high. It *does* sound like a nice feature though,
so I'm sure any well-written patch would be at least considered for
inclusion.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list