Problems with passive service monitoring and freshness alerting
O'Brien, Nick
nick.obrien at eds.com
Tue Aug 5 23:19:56 CEST 2008
Hello,
I am using Nagios v2.3 with passive service checking and I am having
problems getting freshness to work in exactly the way I want.
Our external scripts which check the services write to the Nagios
command file every minute. I want Nagios to alert me if there is no
status check for the service after 5 minutes, then every 60 minutes
while the status remains stale.
However very frequently Nagios is alerting me despite a status being
received within the freshness_threshold, e.g.:
[1217966117] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;servera;webservice1;0;Active sessions 3
[1217966165] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;servera;wesbervice1;0;Active sessions 3
[1217966191] Warning: The results of service 'webservice1' on host
'servera' are stale by 50 seconds (threshold=90 seconds). I'm forcing
an immediate check of the service.
[1217966201] SERVICE NOTIFICATION:
nick;servera;webservice1;CRITICAL;notify-by-email;No servera webservice1
status report inside the freshness interval
I think part of the problem is Nagios is using X in "stale by X seconds"
rather than freshness_threshold or normal_check_interval to run the
freshness check. How do I change X, disable these warnings, and/or at
least get Nagios to run the freshness check at (or near) the actual
threshold?
Also I've set max_check_attempts to 7 but Nagios doesn't always seem to
reset the SOFT count back to after a status of the service is received -
possibly a symptom of the same problem.
I've tried various combination of freshness_threshold,
normal_check_interval, retry_check_interval, and max_check_attempts to
no avail. Anyhow the relevant portions of my Nagios configuration are
below.
Any suggestions about configuring Nagios to do what I want to achieve?
Thanks,
Nick.
define service{
use passive-service
host_name servera
service_description webservice1
name webservice1
notification_interval 60
freshness_threshold 90
max_check_attempts 7
check_command freshness_alert
contact_groups nickgroup
}
define service{
name passive-service
active_checks_enabled 0
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 1
freshness_threshold 900
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
check_period 24x7
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
max_check_attempts 1
normal_check_interval 5
retry_check_interval 1
contact_groups admins
notification_options w,c,r
notification_interval 960
notification_period 24x7
register 0
} }
The freshness_check is
#!/bin/ksh
#
# command to run if not heard from passive check
echo "No $1 $2 status report inside the freshness interval"
exit 2
---
Nick O'Brien Phone: +64 9 487 6335 (x4335)
Middleware Hosting (MHOT), NZ Middleware Capability, EDS
Smales Farm Technology Park, Level 3, 74 Taharoto Road
Takapuna, Auckland 0622 Email: nick.obrien at eds.com
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list