Reporting ideas sought.
Stanley.Hopcroft at Dest.gov.au
Stanley.Hopcroft at Dest.gov.au
Tue Dec 6 06:09:27 CET 2005
Dear Folks,
I am writing to welcome clues about providing an itemised list of
outages and their causes from,
'in some way', Nagios.
The Nagios availability report does ineed provide a useful list of
outages that can be wrapped and
processed to ones hearts content
(eg
HOST_NAME DOWN UP
OUTAGE
Albany_DEST_router 05-12-2005 04:10:59 05-12-2005 08:42:29 4h
31m 30s
Albany_Optus_router_PE_in 05-12-2005 04:10:59 05-12-2005 08:42:29 4h
31m 30s
Lismore_DEST_router 05-12-2005 16:11:30 05-12-2005 20:01:40 3h
50m 10s
Lismore_Optus_router_PE_i 05-12-2005 16:11:30 05-12-2005 20:01:40 3h
50m 10s
Kempsey_DEST_router 05-12-2005 13:16:39 05-12-2005 13:22:49 6m
10s
Kempsey_Optus_router_PE_i 05-12-2005 13:16:39 05-12-2005 13:22:49 6m
10s
Broken_Hill_Optus_router_ 05-12-2005 01:54:17 05-12-2005 01:57:27 3m
10s
Broken_Hill_DEST_router 05-12-2005 01:56:07 05-12-2005 01:57:27 1m
20s
)
but Nagios has AFAIK, no means of capuring event related data and
associating it with an outage
event to produce something like
HOST_NAME DOWN UP
OUTAGE CAUSE COMMENT
Albany_DEST_router 05-12-2005 04:10:59 05-12-2005 08:42:29 4h
31m 30s 1 BDR -> down, provider
Albany_Optus_router_PE_in 05-12-2005 04:10:59 05-12-2005 08:42:29 4h
31m 30s 1 BDR -> down, provider
Lismore_DEST_router 05-12-2005 16:11:30 05-12-2005 20:01:40 3h
50m 10s 2 router restart by power-on
Lismore_Optus_router_PE_i 05-12-2005 16:11:30 05-12-2005 20:01:40 3h
50m 10s 2 power failure
Kempsey_DEST_router 05-12-2005 13:16:39 05-12-2005 13:22:49 6m
10s 1 BDR -> down, provider
Kempsey_Optus_router_PE_i 05-12-2005 13:16:39 05-12-2005 13:22:49 6m
10s 1 BDR -> down, provider
Broken_Hill_Optus_router_ 05-12-2005 01:54:17 05-12-2005 01:57:27 3m
10s 5 dismiss
Broken_Hill_DEST_router 05-12-2005 01:56:07 05-12-2005 01:57:27 1m
20s 5 dismiss
In this case, cause is a coded value that classifies the fault and the
comment is free form text.
The best I can think of to create something like this is to
1 Append the outages to a file - possibly by having an event handler
run the code that extracts the outage from the availability CGI -
or better still all the data for an outage is prob provided by macros -
for the host or service and appending that to a file.
2 Have an admin edit the file and add the values when they become known.
The guts of the problem is Nagios does the right thing by automatically
changing the state of monitored entity; there is no opportuntity to
'officially' close the 'fault' by collecting user-input and associating
it with an outage. Looked at another way, outages don't really exist as
first class objects (with their own methods and data).
All comments are very welcome,
Yours sincerely.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list