Services on down hosts
Andreas Ericsson
ae at op5.se
Thu Jun 12 00:24:39 CEST 2008
Bringing this back on-list. I'd appreciate if you could use
"reply-to-all" instead of just "reply", as some of this discussion
is probably of interest to the rest of the community as well.
Thanks.
Jay R. Ashworth wrote:
> On Wed, Jun 11, 2008 at 05:54:48PM +0200, Andreas Ericsson wrote:
>>> Well, since (to take an example), CRITICAL load means "a loadaverage
>>> over 8" (on my 8-core Opteron), and we don't *know* the load average if
>>> the machine isn't reachable to return a value... then the nrpe checker
>>> on the console in fact *is* getting an IO error when trying to, ok,
>>> read from a network socket.
>> I was more thinking along the lines of errno being set to EIO when
>> attempting to read(2) from an already connected network socket, although
>> there are two schools about that too (some wants all failures to always
>> alert, while some wants a lot of things to be in UNKNOWN state).
>>
>> Not being able to connect clearly signals there is something wrong
>> with the service though, while an EIO signals that there's something
>> wrong with the Nagios hosts' kernel or hardware.
>
> My problem with that is that not all of what Nagios monitors is
> "services", in the meaning we usually give to that term. Much of it is
> "attributes" -- load average and diskspace on a machine being great
> examples.
>
True that, but the service of storing a file on disk (or, for some
retarded filesystems, reading one from a disk) requires there to be a
minimum of free space available. It's what makes up the platform on
which the *real* services rest. Hence servicegroups (which together
make up what a service-provider would call a service).
> IMHO, anything you're trying to monitor that's actually a "service" --
> IE: a public facing website -- shouldn't be directly attached to a host,
> anyway...
>
> What if you're Google? Which host do you attach "http://www.google.com" to?
>
All the query distributors (google works by having several front-end servers
distributing the incoming queries to quite a large army of query responders,
which have access to the gdfs (google distributed filesystem) for doing the
actual lookups). Since a monitoring tool is only worth something if it tells
you *where* things break rather than only that things are broken, that
makes perfect sense for a monitoring system even if that's not the case for
the service provider or its sales people.
>
>>> I think if I'm going to invest a lot of work into code, I'll spend it
>>> reskinning the clunky looking cgi's instead. :-)
>> That could well be a wasted effort. Several UI's already exist, and more
>> are in the brewing. I'd suggest having a look at op5.org within a week or
>> so instead, and check nagios.org and nagios-community.org for news about
>> GUI's (op5.org will only have a reports gui though, while nagios.org
>> will primarily take care of the equivalent of status.cgi et al).
>
> By UI, I presume you do *not* mean what someone else (IMHO) incorrectly
> used that term to mean earlier today -- a configuration front-end tool.
>
No, I do not. I mean an interface displaying current and historical
host and service status.
> I see that op5 is "Coming Soon".
>
Indeed it is. Content is scheduled to be added this friday, although in
what shape said content will be is anyone's guess (although I've got a
shrewd idea it won't be 100% completed and super-easy to use from day
one, as there's a lot of work to be done).
> Are you suggesting that *Ethan* is reworking the status.cgi? Cause I
> see no leaders about that on nagios.org.
Yes, Ethan has been working on a new webbased user interface for Nagios
in the past eight or so weeks. According to his speech at the Nordic
Nagios Meet it's possible it will be a commercial venture. That is,
companies capitalizing from Nagios in one way or another may have to
buy it, while non-profit organizations and home-users will probably
get to download it for free. He was a bit hazy on the details and he
refused to give a release date, so "wait and see" is the best I can
say, I'm afraid.
> And nagios-community.org doesn't seem to exist...
>
nagios-community.org doesn't exist, but nagioscommunity.org does.
Sorry for the confusion.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list