Problems with freshness checking
brian.boysen at colinx.com
brian.boysen at colinx.com
Thu Jun 10 23:11:28 CEST 2004
Hi, I've just gotten on this news list to investigate a problem I've seen.
I looked through the archives and someone named Fabio Lo Votrico posted a
question here about passive service checks indicating stale and Nagios
"forcing an immediate check", even though the log showed a
PROCESS_SERVICE_CHECK_RESULT within the allotted amount of time.
Was this answered off the mailing list (or the answer just didn't make it
into the archives)? If so where could I find it?
When I've seen it, an external service logged 144
PROCESS_SERVICE_CHECK_RESULTS with the same timestamp into the log (I'm
guessing this means that all the results came in on the same processing of
the external commands file), then about 2 minutes later Nagios entered a
message indicating the service(s) timed out and it was forcing an active
check. The active check being a check_dummy!2 would doom this service at
that point because it's scheduled to fail.
>From what I can see the processing of the service checks into the "same
queue for active checks" (sqfac) (from docs/passivechecks.html) is forked
off. The machine is a SUNW Ultra-250. Could the processing into the active
then for Nagios to recognize them in the "queue" (sqfac) take 2 minutes?
I changed from the check_dummy!2 command to something that checks around
for the status two nights ago, but now one service always times out and
fails 20% of the time.
The command_check_interval is -1.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20040610/370435e3/attachment.html>
More information about the Users
mailing list