Reports only show data from a specific tim e period?
BOLLENGIER Eric
ebollengier at sigma.fr
Fri Feb 6 14:23:19 CET 2004
Hi,
At this time (with nagios 2a), the trend.cgi doesn't work very well...
Let me explain more precisely what I mean :
my check_period is 8-19_5x7, but the trend report computes statistics on
24-24_7x7. The thing is, how can nagios give statistics on periods it is
not even checking ? The result, for now, is that, if my service goes
CRITICAL right before the end of the check_period, nagios assumes it is
CRITICAL until the next check-period. Which, obviously, is not what I
want : my statistics are wrong. Or am I missing something somewhere ?
Wouldn't it be better to have it set to "UNDETERMINATE" outside of the
check_period ?
To patch this, we can restart nagios process every hour, so undetermiate
state will appear...
Or else, if we could use a "masq" timeperiod when computing the
statistics, it would probably solve all our problems : we could report
only what we want for who we want :
- working hours for our managers (or clients, for that matter)
- all day for us, as we do need to know if there is a potential problem
on our systems.
- etc.
Thanks in advance,
Regards,
Eric
On Fri, 2004-02-06 at 01:54, Paul L. Allen wrote:
> Hi Andre
>
> Andre Bergei writes:
>
> > Yes, that's is the idea. The reason the managers want this is to prove
> > to the customer That they had uptime during the service hours. The hole
> > point is that They dont care what happen at night, that is our problem,
> > the sys admins. If there is downtime during Service hours, there will be
> > economic penalties if totalt downtime dont meet The demands of the SLA
> > agreement.
>
> I understand the reasoning, I just dispute the logic behind it.
>
> I know that ADSL lines in the UK are unreliable - I know it for a fact
> because Nagios proves it to me. Most of the problems occur out of
> working hours (because we consider 0800-1830 to be working hours). Our
> clients want to know about them because they can claim compensation from
> their ADSL suppliers even though the outages didn't affect actual
> operation.
>
> I know that if I have a host or services which are unreliable out of
> hours, when the workload is minimal, and which are not caused by power
> failures or ADSL outages that there are serious problems. One of our
> clients had a very bad power feed that caused eventual physical disk
> corruption, and without statistics showing that many of the problems
> occurred out of working hours would have had a harder time claiming from
> he power company.
>
> But, in the end, our clients want to know that we are being honest with
> them. That we are not hiding problems that occur out of working hours
> and pretending everything is perfect because they have not *yet* had
> problems during working hours. They want to know what is happening out
> of working hours to prove that we're not hiding anything from them
> (they would want such proof from whoever provided their infrastructure).
>
> It seems to me that what you are arguing for is that the statistics
> CGIs should take into account working hours for the contacts who view
> them. So that if you view the stats you see the whole picture and if
> some client whose working hours are 7am-7pm views them he or she sees
> only the problems that occurred in that interval. That way you could
> define user x-working-hours who only saw the limited information and
> x-overall who saw everything (which would allow customer X to see both
> views and know that you met your SLA but that there are problems outside
> of working hours which might eventually impact regular operations).
>
> > Why would we want to supress information?
> > This is solved by having techy reports for the techyies, and
> > Boring availability reports for the managers. The right information to
> > the right people, a good thing!
>
> Throwing away information is ALWAYS a bad idea. Providing two different
> views onto the information ("this is the availability when you actually
> needed it" and "this is the overall availability whether you needed it
> or not") is a good thing.
>
> > Like it or not, in the "wonderful" world of out-sourcing, things like
> > service level agreements becomes more and more common, in fact, customers
> > _demand_ it.
>
> As I just explained, our customers not only want SLAs during periods when
> the service is critical to them, they also want to know about problems
> outside of those hours for various reasons. If nothing else, giving
> them the 24x7 info shows them that we're being honest with them. If
> every host they have goes down within roughly the same time interval
> we can point out that it must have been an ADSL failure or a power
> failure; if one host alone has problems they know it is probably our
> fault. Whether it happens out of working hours or not, they use that
> information to evaluate our service level against that of their Internet
> feed and power feed. If I can't be honest with our customers, I don't
> want to work here...
--
Eric BOLLENGIER, Administrateur Système - Poste 1325
SIGMA Informatique http://www.sigma.fr
3 rue Newton, BP 4127, 44241 La Chapelle sur Erdre Cedex
tel : 02.40.37.14.00
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20040206/6debec77/attachment.html>
More information about the Users
mailing list