Reporting ideas sought.

Stanley.Hopcroft at Dest.gov.au Stanley.Hopcroft at Dest.gov.au
Tue Dec 6 06:09:27 CET 2005


Dear Folks,

I am writing to welcome clues about providing an itemised list of
outages and their causes from, 
'in some way', Nagios.

The Nagios availability report does ineed provide a useful list of
outages that can be wrapped and
processed to ones hearts content

(eg

HOST_NAME                 DOWN                  UP
OUTAGE

Albany_DEST_router        05-12-2005 04:10:59   05-12-2005 08:42:29   4h
31m 30s

Albany_Optus_router_PE_in 05-12-2005 04:10:59   05-12-2005 08:42:29   4h
31m 30s

Lismore_DEST_router       05-12-2005 16:11:30   05-12-2005 20:01:40   3h
50m 10s

Lismore_Optus_router_PE_i 05-12-2005 16:11:30   05-12-2005 20:01:40   3h
50m 10s

Kempsey_DEST_router       05-12-2005 13:16:39   05-12-2005 13:22:49   6m
10s

Kempsey_Optus_router_PE_i 05-12-2005 13:16:39   05-12-2005 13:22:49   6m
10s

Broken_Hill_Optus_router_ 05-12-2005 01:54:17   05-12-2005 01:57:27   3m
10s

Broken_Hill_DEST_router   05-12-2005 01:56:07   05-12-2005 01:57:27   1m
20s

)

but Nagios has AFAIK, no means of capuring event related data and
associating it with an outage
event to produce something like

HOST_NAME                 DOWN                  UP
OUTAGE      CAUSE   COMMENT

Albany_DEST_router        05-12-2005 04:10:59   05-12-2005 08:42:29   4h
31m 30s  1       BDR -> down, provider

Albany_Optus_router_PE_in 05-12-2005 04:10:59   05-12-2005 08:42:29   4h
31m 30s  1       BDR -> down, provider

Lismore_DEST_router       05-12-2005 16:11:30   05-12-2005 20:01:40   3h
50m 10s  2       router restart by power-on

Lismore_Optus_router_PE_i 05-12-2005 16:11:30   05-12-2005 20:01:40   3h
50m 10s  2       power failure

Kempsey_DEST_router       05-12-2005 13:16:39   05-12-2005 13:22:49   6m
10s      1       BDR -> down, provider

Kempsey_Optus_router_PE_i 05-12-2005 13:16:39   05-12-2005 13:22:49   6m
10s      1       BDR -> down, provider

Broken_Hill_Optus_router_ 05-12-2005 01:54:17   05-12-2005 01:57:27   3m
10s      5       dismiss

Broken_Hill_DEST_router   05-12-2005 01:56:07   05-12-2005 01:57:27   1m
20s      5       dismiss

In this case, cause is a coded value that classifies the fault and the
comment is free form text.

The best I can think of to create something like this is to

1 Append the outages to a file - possibly by having an event handler
run the code that extracts the outage from the availability CGI -
or better still all the data for an outage is prob provided by macros -
for the host or service and appending that to a file.

2 Have an admin edit the file and add the values when they become known.

The guts of the problem is Nagios does the right thing by automatically
changing the state of monitored entity; there is no opportuntity to 
'officially' close the 'fault' by collecting user-input and associating
it with an outage. Looked at another way, outages don't really exist as
first class objects (with their own methods and data). 

All comments are very welcome,

Yours sincerely.


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list