Bug report: downtimes beyond 2038 cause event queue errors
Ton Voon
ton.voon at opsview.com
Thu Apr 4 18:32:36 CEST 2013
Hi!
We've come across a problem in an upgrade of Nagios 3 to Nagios 4 which we can't work out where the fix is. It occurs when an event is scheduled in the future beyond 2038.
Recreation steps:
* Set a downtime on a service to end next day
* Stop Nagios
* Edit the retention.dat so that the end_date=4514791088 (some other values seem to work)
* Start Nagios
When Nagios starts, it will not run any scheduled events in the events queue.
This fails on CentOS 5 64bit, though appears to work on Debian Squeeze 32bit, so it maybe a 64 bit only issue.
We think this is an issue when the event is scheduled via squeue_add(). We've managed to get the test-squeue to fail by changing the time value to be greater than 2038 with the following:
Index: test-squeue.c
===================================================================
--- test-squeue.c (revision 2716)
+++ test-squeue.c (working copy)
@@ -116,7 +116,7 @@
sq_test_random(sq);
t(squeue_size(sq) == 0, "Size should be 0 after first sq_test_random");
- t((a.evt = squeue_add(sq, time(NULL) + 9, &a)) != NULL);
+ t((a.evt = squeue_add(sq, time(NULL)*2, &a)) != NULL);
t(squeue_size(sq) == 1);
t((b.evt = squeue_add(sq, time(NULL) + 3, &b)) != NULL);
t(squeue_size(sq) == 2);
This gives the test result of:
### squeue tests
FAIL max <= *d @test-squeue.c:86
FAIL x == &b @test-squeue.c:133
FAIL x->id == b.id @test-squeue.c:134
FAIL x == &c @test-squeue.c:141
about to fail pretty fucking hard...
ea: 0xbfe065e0; &b: 0xbfe065d8; &c: 0xbfe065d0; ed: 0xbfe065c8; x: 0xbfde9b80
FAIL x == &b @test-squeue.c:152
FAIL x->id == b.id @test-squeue.c:153
FAIL x == &b @test-squeue.c:160
FAIL x->id == b.id @test-squeue.c:161
FAIL x == &c @test-squeue.c:166
FAIL x->id == c.id @test-squeue.c:167
Test results: 390637 passed, 10 failed
Changing to a factor of 1.1 instead of 2 passes:
### squeue tests
Test results: 390647 passed, 0 failed
This worked in Nagios 3, so we're guessing that the change to use the squeue library for events is probably where this limitation has come in.
Any thoughts?
Ton
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
More information about the Developers
mailing list