Recovery not getting sent during downtime?
srunschke at abit.de
srunschke at abit.de
Fri Jul 28 15:46:21 CEST 2006
Hi folks,
I'm currently using Nagios 2.0b3 (never change a running system ;)) and
ran
into the following problem:
Service went critical
SMS and emails got dispatched
found problem, decided to reboot the machine to fix it
scheduled downtime for host
rebooted host
everything went ok again
no SMS/email got dispatched to state the service recovered though!
I'm unsure if this problem was already fixed, I didn't find any real
evidence in google or the changelogs. Though fixes in the
recovery logics and notifcation system itself were documented,
they weren't too detailed though.
Question: is this a bug or feature? If it is a bug, has it been fixed in
a newer release which I can update to?
It poses a problem to us as admins that are currently offsite don't get
messages that the problem is ok already. So we get quite some unnecessary
phonecalls to check for a problem that is already solved.
Here's an excerpt how it looked like in the nagios log:
[1153954542] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;SOFT;1;Connection
refused
[1153954600] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;SOFT;2;Connection
refused
[1153954660] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;HARD;3;Connection
refused
[1153954660] SERVICE NOTIFICATION:
RGingter;NSEXT01;NOTES;CRITICAL;notify-by-email;Connection refused
[1153954660] SERVICE NOTIFICATION:
MArslan;NSEXT01;NOTES;CRITICAL;notify-by-email;Connection refused
[1153954660] SERVICE NOTIFICATION:
IT_Service;NSEXT01;NOTES;CRITICAL;notify-by-email;Connection refused
[1153955260] SERVICE NOTIFICATION:
RGingter_SMS;NSEXT01;NOTES;CRITICAL;notify-by-sms;Connection refused
[1153955260] SERVICE NOTIFICATION:
MArslan_SMS;NSEXT01;NOTES;CRITICAL;notify-by-sms;Connection refused
...rest of alerts snipped out...
[1153980519] EXTERNAL COMMAND:
SCHEDULE_HOST_DOWNTIME;NSEXT01;1153980509;1153981829;1;0;7200;technik;Neustart
MAr
[1153980519] HOST DOWNTIME ALERT: NSEXT01;STARTED; Host has entered a
period of scheduled downtime
[1153980595] HOST ALERT: NSEXT01;DOWN;SOFT;1;CRITICAL - 10.150.1.2: rta
nan, lost 100%
[1153980605] HOST ALERT: NSEXT01;DOWN;SOFT;2;CRITICAL - 10.150.1.2: rta
nan, lost 100%
[1153980615] HOST ALERT: NSEXT01;DOWN;HARD;3;CRITICAL - 10.150.1.2: rta
nan, lost 100%
[1153980615] SERVICE ALERT: NSEXT01;PING;CRITICAL;HARD;1;CRITICAL -
10.150.1.2: rta nan, lost 100%
[1153980687] SERVICE ALERT: NSEXT01;CPU;CRITICAL;HARD;1;CRITICAL - Socket
timeout after 10 seconds
[1153980687] SERVICE ALERT: NSEXT01;UPTIME;CRITICAL;HARD;1;CRITICAL -
Socket timeout after 10 seconds
[1153980687] SERVICE ALERT: NSEXT01;DISK_C;CRITICAL;HARD;1;CRITICAL -
Socket timeout after 10 seconds
[1153980707] HOST ALERT: NSEXT01;UP;HARD;1;OK - 10.150.1.2: rta 1.382ms,
lost 0%
[1153980707] SERVICE ALERT: NSEXT01;PING;OK;HARD;1;OK - 10.150.1.2: rta
3.307ms, lost 0%
[1153980767] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;SOFT;1;Connection
refused
[1153980805] SERVICE ALERT: NSEXT01;MEMUSE;CRITICAL;SOFT;1;Connection
refused
[1153980805] SERVICE ALERT: NSEXT01;DISK_D;CRITICAL;SOFT;1;Connection
refused
[1153980805] SERVICE ALERT: NSEXT01;DISK_E;CRITICAL;SOFT;1;Connection
refused
[1153980828] SERVICE ALERT: NSEXT01;NOTES;OK;SOFT;2;TCP OK - 0.070 second
response time on port 1352
[1153980976] SERVICE ALERT: NSEXT01;CPU;OK;HARD;1;CPU Load 37% (10 min
average)
[1153980976] SERVICE ALERT: NSEXT01;UPTIME;OK;HARD;1;System Uptime - 0
day(s) 0 hour(s) 5 minute(s)
[1153980976] SERVICE ALERT: NSEXT01;DISK_C;OK;HARD;1;C:\ - total: 3.00 Gb
- used: 2.05 Gb (68%) - free 0.95 Gb (32%)
[1153981105] SERVICE ALERT: NSEXT01;MEMUSE;OK;SOFT;2;Memory usage:
total:1951.26 Mb - used: 434.44 Mb (22%) - free: 1516.82 Mb (78%)
[1153981105] SERVICE ALERT: NSEXT01;DISK_D;OK;SOFT;2;D:\ - total: 5.43 Gb
- used: 2.46 Gb (45%) - free 2.97 Gb (55%)
[1153981105] SERVICE ALERT: NSEXT01;DISK_E;OK;SOFT;2;E:\ - total: 67.83 Gb
- used: 14.92 Gb (22%) - free 52.91 Gb (78%)
[1153981832] HOST DOWNTIME ALERT: NSEXT01;STOPPED; Host has exited from a
period of scheduled downtime
Any insight in this would be appreciated.
sincerely
Sascha
--
Sascha Runschke
Netzwerk Management
IT-Services
ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch
Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:SRunschke at abit.de
http://www.abit.net
http://www.abit-epos.net
---------------------------------
Sicherheitshinweis zur E-Mail Kommunikation /
Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
More information about the Developers
mailing list