Unexpected trends reports
Edgar Shine
eshine at mcosta.eng.br
Tue Apr 19 23:38:48 CEST 2005
Hi,
After read my post I decided re-submit it, because the problem was
poorly described in my first (and 2nd) email.
Let´s try again: :P
I´m using Nagios (2.0b2) to monitor remote radios (about 300 devices)
using ping plugin. I have some problems with trend reports.
Problem description:
1) Trend reports states an outage: "Critical - Time range: Thu Apr 7
13:13:57 2005 to Thu Apr 7 15:34:47 2005 - Duration: 0d 2h 20m 50s -
State Info: Critical - Plugin timed out after 10 seconds".
2) I´ve realized that this is not the true, the real outage time was
less than 5 minutes. Looking the service alert history, I´ve found these
lines:
---begin---
[04-07-2005 13:13:57] SERVICE ALERT:
tajuras_comercial;PING;CRITICAL;HARD;1:CRITICAL - Plugin timed out after
10 seconds
[04-07-2005 13:17:58] SERVICE ALERT:
tajuras_comercial;PING;WARNING;SOFT;1;PING WARNING - Packet loss = 40%,
RTA = 25.30 ms
[04-07-2005 13:18:57] SERVICE ALERT:
tajuras_comercial;PING;OK;SOFT;2;PING OK - Packet loss = 40%, RTA = 29.40 ms
[04-07-2005 15:34:47] Caught SIGTERM, shutting down...
[04-07-2005 15:34:47] Nagios 2.0b2 starting...(PID=31270)
---end---
3) The nagios.log file has these lines:
---begin---
[1112890437] SERVICE ALERT:
tajuras_comercial;PING;CRITICAL;HARD;1;CRITICAL - Plugin timed out after
10 seconds
[1112890678] SERVICE ALERT: tajuras_comercial;PING;WARNING;SOFT;1;PING
WARNING - Packet loss = 40%, RTA = 25.30 ms [1112890737] SERVICE ALERT:
tajuras_comercial;PING;OK;SOFT;2;PING OK - Packet loss = 0%, RTA = 29.40 ms
[1112898887] INITIAL SERVICE STATE:
tajuras_comercial;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 35.50 ms
--eof---
I presume that after a critical hard state, trends.cgi expects a hard
recovery to graph a recovery state, but there is just a soft recovery
after a soft state warning alert.
As a workaround, I configured the warning state (199.99 ms, 79%) values
to be near to critical state (200ms,80%), but if I could use warning
states it´ll be useful to set priorities for my team to fix these polled
devices.
System info:
- Linux (Debian 3.0 - stable):
- libgd1: 1.8.4-17
- libgd2: 2.0.1-10
- zlib1g-dev: 1.1.4-1.0
- libpng2-dev: 1.0.12-3
- libjpeg62-dev: 6b-5
I´ll appreciate any tips about this issue.
TIA for your time.
rgds,
Edgar Shine
-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list