Automatically acknowledge services of an acknowledged host
Jochen Bern
Jochen.Bern at LINworks.de
Thu Dec 9 10:52:59 CET 2010
On 12/09/2010 03:01 AM, Mathieu Gagné wrote:
> On 12/8/10 5:08 PM, Julien Mathis wrote:
>> Moreover I think you should reconsider your plugins. Is it normal for a
>> plugin to returns the CRITICAL status when it can not connect? Wouldn't
>> it be more appropriate with the UNKNOWN status?
> Which plugins are we talking about?
> For example, if I use "check_http" and the port isn't opened for
> whatever reason (service is crashed, firewall, etc.), it is CRITICAL to
> me, not UNKNOWN. This is my business need. (but hey, to each his own)
[...]
> When the host is DOWN, service problems are silenced and NO
> notifications are sent, they are "muted". Why would you want to
> acknowledge a service problem if there isn't any notifications sent to
> contacts?
While I agree that automatic acknowledments promise to create more
problems than they'll solve, I'd like to comment on *this* tangent.
Of course, any kind of service where reachability via the net is part
and parcel of "the service" *should* use the OK/WARNING/CRITICAL range
of states to report connectivity problems. However, there also is a
plethora of checks where the remote access only satisfies the need of
centralizing the monitoring - CPU/RAM/disk usage, load, # of users, log
scans, hardware failures, you name it. In those cases, I *would* welcome
the possibility to map connectivity issues to UNKNOWN (or some
service-kin of hosts' UNREACHABLE) instead.
My favorite remote connector is check_by_ssh / check_by_ssc (the latter
basically being a multihop "Matryoshka-of-tunnels" SSH). Some of the
hosts actually have check_ping as their host check, for some of them it
would be outright *wrong* to change that to check_ssh (e.g., because I'm
also using check_http against that host). Of course I have a plain "does
SSH work" service defined on them and declare all services using
check_by_ssh as dependent on it. I even reduced the *_intervals and
max_check_attempts of the SSH check to priorize it. No dice, I *still*
get notifications for some of the dependent services before SSH is
declared CRITICAL.
Also, it's not *all* in the plugins. (It is in *most* cases, though -
and check_by_ssh falling back to the normal SSH_COMMAND, which isn't
aware of the needs of Nagios in the slightest, certainly doesn't help
*this* cause :-} ). In some cases, it's the Nagios core who times out
the check and provides a CRITICAL - e.g., the check_by_ssh timeout
doesn't apply to name resolution:
> # time ./check_by_ssh -H www.foobar.co.bj -C id -t 1
> check_by_ssh: Invalid hostname/address - www.foobar.co.bj
> real 0m5.132s
Kind regards,
J. Bern
--
Jochen Bern, Systemingenieur --- LINworks GmbH <http://www.LINworks.de/>
Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt
PGP (1024D/4096g) FP = D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C27
Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202
Unternehmenssitz Weiterstadt, Geschäftsführer Metin Dogan, Oliver Michel
------------------------------------------------------------------------------
This SF Dev2Dev email is sponsored by:
WikiLeaks The End of the Free Internet
http://p.sf.net/sfu/therealnews-com
More information about the Developers
mailing list