Notification Problem
Shaun Martin
smartin at akazaresearch.com
Tue Sep 2 19:01:33 CEST 2008
Hi Marc,
Thanks for looking at my problem as it is driving me nuts why one hosts
alerts and the other does not. I have provided the requested info below.
> Please post --
> the service template definition
define service{
name generic-service ; The 'name' of
this service templ
ate
active_checks_enabled 1 ; Active service
checks are enable
d
passive_checks_enabled 1 ; Passive
service checks are enabl
ed/accepted
parallelize_check 1 ; Active service
checks should be
parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess
over this servi
ce (if necessary)
check_freshness 0 ; Default is to NOT
check service
'freshness'
notifications_enabled 1 ; Service
notifications are enable
d
event_handler_enabled 1 ; Service event
handler is enabled
flap_detection_enabled 1 ; Flap detection is
enabled
failure_prediction_enabled 1 ; Failure prediction
is enabled
process_perf_data 1 ; Process
performance data
retain_status_information 1 ; Retain status
information across
program restarts
retain_nonstatus_information 1 ; Retain non-status
information ac
ross program restarts
is_volatile 0 ; The service is not
volatile
check_period 24x7 ; The service can be
checked at an
y time of the day
max_check_attempts 3 ; Re-check the service
up to 3 tim
es in order to determine its final (hard) state
normal_check_interval 10 ; Check the service
every 10 minut
es under normal conditions
retry_check_interval 2 ; Re-check the service
every two m
inutes until a hard state can be determined
contact_groups admins ; Notifications
get sent out to ev
eryone in the 'admins' group
notification_options w,u,c,r ; Send notifications
about warning
, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about
service problems
every hour
notification_period 24x7 ; Notifications can
be sent out at
any time
register 0 ; DONT REGISTER
THIS DEFINITION -
ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
# Local service definition template - This is NOT a real service, just a
template!
define service{
name local-service ; The name of this service
templat
e
use generic-service ; Inherit default values from
the
generic-service definition
max_check_attempts 4 ; Re-check the service
up to 4 tim
es in order to determine its final (hard) state
normal_check_interval 5 ; Check the service
every 5 minute
s under normal conditions
retry_check_interval 1 ; Re-check the service
every minut
e until a hard state can be determined
register 0 ; DONT REGISTER THIS
DEFINITION -
ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
define service{
name local-service-isovera ;
The name of this service
template
use generic-service ; Inherit
default values from the
generic-service definition
max_check_attempts 3 ; Re-check
the service up to 4 tim
es in order to determine its final (hard) state
normal_check_interval 5 ; Check the
service every 5 minute
s under normal conditions
retry_check_interval 1 ; Re-check
the service every minut
e until a hard state can be determined
register 0 ; DONT
REGISTER THIS DEFINITION -
ITS NOT A REAL SERVICE, JUST A TEMPLATE!
contact_groups isovera
}
> the service definition
> the host template definition
define host{
name generic-host ; The name of this
host template
notifications_enabled 1 ; Host notifications are
enabled
event_handler_enabled 1 ; Host event handler is
enabled
flap_detection_enabled 1 ; Flap detection is
enabled
failure_prediction_enabled 1 ; Failure prediction is
enabled
process_perf_data 1 ; Process performance
data
retain_status_information 1 ; Retain status
information across program
restarts
retain_nonstatus_information 1 ; Retain non-status
information across pro
gram restarts
notification_period 24x7 ; Send host notifications at any
time
register 0 ; DONT REGISTER THIS
DEFINITION - ITS NOT
A REAL HOST, JUST A TEMPLATE!
}
define host{
name linux-server-isovera ; The name
of th
is host template
use generic-host ; This template
inherits
other values from the generic-host template
check_period 24x7 ; By default, Linux
host
s are checked round the clock
check_interval 5 ; Actively check the
hos
t every 5 minutes
retry_interval 1 ; Schedule host
check re
tries at 1 minute intervals
max_check_attempts 10 ; Check each Linux
host
10 times (max)
check_command check-host-alive ; Default command
to check Linux hosts
notification_period 24x7 ; Linux admins hate to be
woken up, so we only
notify during the day
; Note that the
notification_period variab
le is being overridden from
; the value that is
inherited from the gene
ric-host template!
notification_interval 120 ; Resend
notifications every 2 hours
notification_options d,u,r ; Only send
notifications for specific hos
t states
contact_groups isovera ; Notifications get
sent to the admins by
default
register 0 ; DONT REGISTER THIS
DEFINITION - ITS NOT
A REAL HOST, JUST A TEMPLATE!
}
> the host and service definition
############################################################################
###
############################################################################
###
#
# HOST DEFINITION
#
############################################################################
###
############################################################################
###
# Define a host for the local machine
define host{
use linux-server
host_name www.localhost.org
alias www.localhost.org
address 198.xxx.xxx.xx
}
############################################################################
###
############################################################################
###
#
# SERVICE DEFINITIONS
#
############################################################################
###
############################################################################
###
# Define a service to "ping" the local machine
# Define a service to check the disk space of the root partition
# on the local machine. Warning if < 20% free, critical if
# < 10% free space on partition.
define service{
use local-service-isovera
host_name www.localhost.org
service_description / Partition
check_command check_nrpe!check_disk1
}
define service{
use local-service-isovera
host_name www.localhost.org
service_description /usr Partition
check_command check_nrpe!check_disk4
}
define service{
use local-service-isovera
host_name www.localhost.org
service_description /var Partition
check_command check_nrpe!check_disk5
}
define service{
use local-service-isovera
host_name www.localhost.org
service_description /var/run Partition
check_command check_nrpe!check_disk9
}
define service{
use local-service-isovera
host_name www.localhost.org
service_description /opt Partition
check_command check_nrpe!check_disk8
}
define service{
use local-service-isovera
host_name www.localhost.org
service_description /tmp Partition
check_command check_nrpe!check_disk3
}
define service{
use local-service-isovera
host_name www.localhost.org
service_description /home Partition
check_command check_nrpe!check_disk6
}
# Define a service to check the number of currently logged in
# users on the local machine. Warning if > 20 users, critical
# if > 50 users.
define service{
use local-service
host_name www.localhost.org
service_description Current Users
check_command check_nrpe!check_users
}
# Define a service to check the number of currently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.
define service{
use local-service-isovera
host_name www.localhost.org
service_description Local Processes
check_command check_nrpe!check_local_procs
}
define service{
use local-service-isovera
host_name www.localhost.org
service_description Total Processes
check_command check_nrpe!check_total_procs
}
define service{
use local-service-isovera
host_name www.localhost.org
service_description Zombie Processes
check_command check_nrpe!check_zombie_procs
}
# Define a service to check the load on the local machine.
define service{
use local-service-isovera
host_name www.localhost.org
service_description Current Load
check_command check_nrpe!check_local_load
}
# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is
free
define service{
use local-service
host_name www.localhost.org
service_description Memory Usage
check_command check_nrpe!check_mem
}
define service{
use local-service-isovera
host_name www.localhost.org
service_description Swap Usage
check_command check_nrpe!check_local_swap
}
# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may
have SSH enabled.
define service{
use local-service-isovera
host_name www.localhost.org
service_description SSH
check_command check_ssh
}
# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may
have HTTP enabled.
define service{
use local-service-isovera
host_name www.localhost.org
service_description HTTP
check_command check_http
}
define service{
use local-service-isovera
host_name www.localhost.org
service_description Check Domain
check_command check_domain!biosciednet.org
}
> the contactgroup definition
define contactgroup{
contactgroup_name isovera
alias Nagios Administrators
members sdemi,nagiosadmin
}
efine contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin
}
> the contact definition
define contact{
name generic-contact ; The name of
th
is contact template
service_notification_period 24x7 ; service notifi
cations can be sent anytime
host_notification_period 24x7 ; host notificat
ions can be sent anytime
service_notification_options w,u,c,r,f,s ; send notificat
ions for all service states, flapping events, and scheduled downtime events
host_notification_options d,u,r,f,s ; send notificat
ions for all host states, flapping events, and scheduled downtime events
service_notification_commands notify-service-by-email ; send
service n
otifications via email
host_notification_commands notify-host-by-email ; send host
noti
fications via email
register 0 ; DONT REGISTER
THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
define contact{
contact_name sdemi
use generic-contact
alias Isovera Admin2
email sdemi at isovera.com
}
define contact{
contact_name nagiosadmin ; Short name of
user
use generic-contact ; Inherit default values from
gene
ric-contact template (defined above)
alias Nagios Admin ; Full name of
user
email smartin at akazaresearch.com ;
<<***** CHANGE THIS TO Y
OUR EMAIL ADDRESS ******
}
> the Service State Information from the web gui when it should have
> sent a notification (click on the service name)
Service State Information
Current Status:
WARNING
(for 3d 22h 36m 52s)
Status Information: USERS WARNING - 1 users currently logged in
Performance Data: users=1;1;10;0
Current Attempt: 1/4 (HARD state)
Last Check Time: 09-02-2008 12:55:00
Check Type: ACTIVE
Check Latency / Duration: 0.205 / 0.186 seconds
Next Scheduled Check: 09-02-2008 13:00:00
Last State Change: 08-29-2008 14:22:33
Last Notification: N/A (notification 0)
Is This Service Flapping?
NO
(0.00% state change)
In Scheduled Downtime?
NO
Last Update: 09-02-2008 12:59:16 ( 0d 0h 0m 9s ago)
Active Checks:
ENABLED
Passive Checks:
ENABLED
Obsessing:
ENABLED
Notifications:
ENABLED
Event Handler:
ENABLED
Flap Detection:
ENABLED
> any nagios.log entries for the service when it should have sent a
> notification
That is the thing if I click on notifications it does not even report
sending one. So I know it is now an email issue as nagios never even tries
to send a notification. The weirdest part is I have other hosts using the
exact same templates that do send and log that they send notifications for
hosts and services. This box only sends and logs host notifications.
Thanks for all your help.
Thanks,
Shaun
On 8/29/08 6:00 PM, "Marc Powell" <marc at ena.com> wrote:
>
> On Aug 29, 2008, at 1:48 PM, Shaun Martin wrote:
>
>> Hi All,
>>
>> So I am using all templates and I just added a new host today. Using
>> the same service and host templates as all my other hosts. Those
>> other hosts send me out notifications when a service hits warning or
>> critical. My new host only seems to send out host notifications and
>> not service notifications. Like I said the same template is used for
>> this new host and my old hosts for services. I am using the nrpe
>> agent, the only real difference about this machine is it is Sun OS,
>> but the check_nrpe c returns the same values as it does on a linux
>> box so I do not think that is the issue. Also I noticed even though
>> the service has been checked many times with a warning or critical
>> status, the current attempt never progresses of of one. I do a check
>> service now and wait for it to finish and the attempt is still 1/4.
>> So I do not know is that is the underlying issue. I have restarted
>> nagios after ever configuration change and I did a ps to make sure
>> their was no instance running before starting backup. Since I am
>> using templates I do not know what my problem is as every other
>> service notifies except services on this host which is using the
>> same host and service templates. I am bout to pull my hair out any
>> help would be appreciated.
>
>
> You'll have to provide more detailed information. The above is too
> vague to figure out what the underlying issue is. You've covered some
> of the bases though.
>
> Please post --
> the service template definition
> the service definition
> the host template definition
> the host definition
> the contactgroup definition
> the contact definition
> the Service State Information from the web gui when it should have
> sent a notification (click on the service name)
> any nagios.log entries for the service when it should have sent a
> notification
>
> --
> Marc
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
--
Shaun Martin
Systems Administrator
Akaza Research
smartin at akazaresearch.com
Office: (617) 621-8585 x 13
Cell: (978) 360-3402
www.akazaresearch.com <http://www.akazaresearch.com/>
www.openclinica.org <http://www.openclinica.org/>
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list