Unexpected trends reports
Subhendu Ghosh
sghosh at sghosh.org
Thu Apr 21 02:28:26 CEST 2005
On Tue, 19 Apr 2005, Edgar Shine wrote:
> Hi,
>
> After read my post I decided re-submit it, because the problem was poorly
> described in my first (and 2nd) email.
> Let´s try again: :P
>
> I´m using Nagios (2.0b2) to monitor remote radios (about 300 devices) using
> ping plugin. I have some problems with trend reports.
>
> Problem description:
> 1) Trend reports states an outage: "Critical - Time range: Thu Apr 7 13:13:57
> 2005 to Thu Apr 7 15:34:47 2005 - Duration: 0d 2h 20m 50s - State Info:
> Critical - Plugin timed out after 10 seconds".
> 2) I´ve realized that this is not the true, the real outage time was less
> than 5 minutes. Looking the service alert history, I´ve found these lines:
> ---begin---
> [04-07-2005 13:13:57] SERVICE ALERT:
> tajuras_comercial;PING;CRITICAL;HARD;1:CRITICAL - Plugin timed out after 10
> seconds
> [04-07-2005 13:17:58] SERVICE ALERT:
> tajuras_comercial;PING;WARNING;SOFT;1;PING WARNING - Packet loss = 40%, RTA =
> 25.30 ms
> [04-07-2005 13:18:57] SERVICE ALERT: tajuras_comercial;PING;OK;SOFT;2;PING OK
> - Packet loss = 40%, RTA = 29.40 ms
> [04-07-2005 15:34:47] Caught SIGTERM, shutting down...
> [04-07-2005 15:34:47] Nagios 2.0b2 starting...(PID=31270)
> ---end---
> 3) The nagios.log file has these lines:
> ---begin---
> [1112890437] SERVICE ALERT: tajuras_comercial;PING;CRITICAL;HARD;1;CRITICAL -
> Plugin timed out after 10 seconds
> [1112890678] SERVICE ALERT: tajuras_comercial;PING;WARNING;SOFT;1;PING
> WARNING - Packet loss = 40%, RTA = 25.30 ms [1112890737] SERVICE ALERT:
> tajuras_comercial;PING;OK;SOFT;2;PING OK - Packet loss = 0%, RTA = 29.40 ms
> [1112898887] INITIAL SERVICE STATE: tajuras_comercial;PING;OK;HARD;1;PING OK
> - Packet loss = 0%, RTA = 35.50 ms
> --eof---
>
> I presume that after a critical hard state, trends.cgi expects a hard
> recovery to graph a recovery state, but there is just a soft recovery after a
> soft state warning alert.
>
> As a workaround, I configured the warning state (199.99 ms, 79%) values to be
> near to critical state (200ms,80%), but if I could use warning states it´ll
> be useful to set priorities for my team to fix these polled devices.
>
> System info:
> - Linux (Debian 3.0 - stable):
> - libgd1: 1.8.4-17
> - libgd2: 2.0.1-10
> - zlib1g-dev: 1.1.4-1.0
> - libpng2-dev: 1.0.12-3
> - libjpeg62-dev: 6b-5
>
> I´ll appreciate any tips about this issue.
> TIA for your time.
>
> rgds,
> Edgar Shine
>
Nagios does not use SOFT states for any kind of reporting. All state
changes must be confirmed as a HARD state. Only impact of SOFT states is
the use of event handlers.
The service never shows as being OK;HARD
--
-sg
More information about the Users
mailing list