How to troubleshoot when not receiving alerts?]
John Oliver
joliver at john-oliver.net
Thu Jul 24 23:59:10 CEST 2008
On Thu, Jul 24, 2008 at 01:53:19PM -0500, Marc Powell wrote:
>
> On Jul 24, 2008, at 12:59 PM, John Oliver wrote:
>
> > I have one alert set up that should be emailing every time it runs...
> > it's a disk space check on a server that has 1% left. However, I am
> > not
> > receiving any emails. How to go about figuring out why?
>
> Generally --
>
> See if you have notification alerts in nagios.log for it.
> if no, verify the notification options for the service defintion,
> contactgroup and contacts.
No, nothing is getting logged. But then, there are very few logs
compared to the number of hosts / services it's monitoring... it looks
like only emails are being logged. I looked in nagios.cfg for a logging
level type of option, but no dice.
It was working yesterday. I was getting emails from this plugin every
24 minutes (notification_interval was 1440). They were all errors. I
thought I had the errors fixed... the last email I got said RECOVERED
(even though I should be getting CRITICAL alerts, as there is 1% disk
space left). I changed the notification_interval, and never saw another
email.
This AM, I set notification_interval to 60 I should get an email every
minute. I'm not. And, yes, I'm restarting nagios ;-)
Here's the stanza in services.cfg:
define service{
use generic-service ; Name
of service template to use
host_name ftp
service_description Disk Space
is_volatile 0
check_period normalbusinesshours
max_check_attempts 3
normal_check_interval 120
retry_check_interval 10
contact_groups FTP_Alerts
notification_interval 60
notification_period normalbusinesshours
notification_options w,u,c,r
check_command check_remote_disk1
register 1
}
hosts.cfg:
define host{
use generic-host ; host template
to use
host_name ftp
alias ftp.domain.com
address 10.11.12.13
# check_command check-host-alive
max_check_attempts 10
notification_interval 1800
notification_period 24x7
notification_options d,u,r
contact_groups HelpDesk
}
checkcommands.cfg:
define command{
command_name check_remote_disk1
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c
check_disk
}
And I can check the remote system from the command line:
[root at cerberus ~]# /usr/lib/nagios/plugins/check_nrpe -H ftp -c
check_disk
DISK OK - free space: / 2321 MB (1% inode=99%);|
/=133114MB;142786;142796;0;142806
Yes, I just noticed the discrepancy between contact_groups in
services.cfg and hosts.cfg I doubt that's the issue, as I was getting
emails yesterday.
Any help appreciated!
--
***********************************************************************
* John Oliver http://www.john-oliver.net/ *
* *
***********************************************************************
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list