Nagios freshness checks randomly think check results are from 1973.
fevin Kagen
fevinkagen at gmail.com
Mon Aug 11 00:28:52 CEST 2008
Thanks for the quick reply. My date is correct on both servers, but I
am using the same two scripts on both. This could certainly be
causing the issue. They are modified versions of Francois Meehan's
snmptraphandling.py. I don't know this first thing about python so I
just hacked around until I got them working the way I wanted. I don't
think I changed anything to deal w/ time and date handling, but who
knows. Here is one of the scripts >
---------------------------------------------------------------
#!/usr/bin/python -u
"""
Written by Francois Meehan (Cedval Info)
First release 2004/09/15
This script receives input from sec.pl concerning translated snmptraps
*** Important note: sec must send DATA within quotes
Ex: ./services.py $1 $2 $3 $4
"""
import commands, string, os, sys, time
global return_code
def check_arg():
try:
host = sys.argv[1]
except:
print "usage: services.py <HOST> <SEVERITY> <JOB> <DATA>"
sys.exit()
try:
severity = sys.argv[2]
except:
print "usage: services.py <HOST> <SEVERITY> <JOB> <DATA>"
sys.exit()
try:
job = sys.argv[3]
except:
print "usage: services.py <HOST> <SEVERITY> <JOB> <DATA>"
sys.exit()
try:
mondata_res = sys.argv[4]
except:
print "usage: services.py <HOST> <SEVERITY> <JOB> <DATA>"
sys.exit()
return (host, severity, job, mondata_res)
def post_results(host, job, mondata_res, return_code):
mytime = time.time()
mytime = str(mytime)
mytime = mytime[:-3]
#print mondata_res
output = open('/usr/local/nagios/var/rw/nagios.cmd', 'w')
results = "[" + mytime + "] " + "PROCESS_SERVICE_CHECK_RESULT;" \
+ host + ";" + job + ";" \
+ return_code + ";" + mondata_res + "\n"
output.write(results)
def get_return_code():
if severity == "INFORMATIONAL":
return_code = "0"
elif severity == "Normal":
return_code = "0"
elif severity == "SEVERE":
return_code = "2"
elif severity == "MAJOR":
return_code = "2"
elif severity == "CRITICAL":
return_code = "2"
elif severity == "WARNING":
return_code = "1"
elif severity == "MINOR":
return_code = "1"
return return_code
# Main routine...
if __name__ == '__main__':
(host, severity, job, mondata_res) = check_arg() # validating
# parameters
return_code = get_return_code()
post_results(host, job, mondata_res, return_code)
-----------------------------------------------------------------------------------
Here are two examples that don't have freshness checks. It just seems
strange that they are always November of 1973. The days are just
different....
Other_Event_SNMP
Active checks of the service have been disabled - only passive checks
are being accepted
OK 11-08-1973 02:10:26 12694d 15h 7m 38s 1/1 " No action
required: Backup Exec: Application Initializing"
Other_Event_SNMP
This service has 1 comment associated with it Active checks of the
service have been disabled - only passive checks are being
accepted This service is flapping between states
OK 11-10-1973 22:01:21 1d 16h 47m 1s 1/1 " No action required:
Backup Exec: Backup Job Contains No Data"
On 8/10/08, Thomas Guyot-Sionnest <dermoth at aei.ca> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10/08/08 10:46 AM, fevin Kagen wrote:
>> I'm having this exact same issue on two different Nagios servers. One
>> running Fedora and one running Ubuntu. It is driving me crazy.
>>
>>
>>
>> An SNMP Trap is received for a backup job. It is properly translated
>> and ultimately received by Nagios. I have a freshness check in place
>> for the service that will create a critical alert if a check is not
>> received every 26 hours. For whatever reason, the check is run right
>> after the alert is received and it think the results are 12K+ days old.
>> It seems to periodically think the results are from 11-08-1973.
>>
>>
>>
>> This is an ongoing issue on both servers, but certainly isn't the norm.
>> Typically the freshness checks work great. However, about once or
>> twice a week, I see this behavior on random checks. Here is an example:
>
> There's something wrong with the script/process returning these passive
> check. The passive check returned to the Nagios command pipe has the
> following format:
>
>> [<timestamp>]
>> PROCESS_PASSIVE_CHECK_RESULT;<hostname>;<service_name>;<return_code>;<status_text>
>
> The timestamp is a normal UNIX timestamp (seconds since EPOCH) and
> determine the time at which the check was performed.
>
> If you return an invalid/too old timestamp, Nagios will think the
> service is that old and trigger the freshness check.
>
> I believe a possible way to receive such old timestamps is if you have a
> server with invalid date (i.e. in the 1972's) using send_nsca, where the
> nsca daemon would be set up with max_packet_age=0.
>
> - --
> Thomas
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFInzV06dZ+Kt5BchYRAn7yAKC9hSQEsPiPlFrfJwIbIdgA7AoTgQCg+eDr
> h5VTjSAp849z3OaTYDj7Las=
> =qctW
> -----END PGP SIGNATURE-----
>
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list