How to troubleshoot when not receiving alerts?]
Marc Powell
marc at ena.com
Fri Jul 25 06:12:55 CEST 2008
On Jul 24, 2008, at 4:59 PM, John Oliver wrote:
> No, nothing is getting logged. But then, there are very few logs
> compared to the number of hosts / services it's monitoring... it looks
> like only emails are being logged. I looked in nagios.cfg for a
> logging
> level type of option, but no dice.
There are several variables that control logging in nagios.cfg. Look
for log_ and debug_ in http://nagios.sourceforge.net/docs/3_0/configmain.html
. I believe the default configuration is to log initial states, hard
state changes, event handlers and notifications.
> It was working yesterday. I was getting emails from this plugin every
> 24 minutes (notification_interval was 1440). They were all errors. I
Unless you've changed interval_length from it's default of 60, all
_interval parameters are minutes, not seconds so that seems strange.
> thought I had the errors fixed... the last email I got said RECOVERED
> (even though I should be getting CRITICAL alerts, as there is 1% disk
> space left). I changed the notification_interval, and never saw
> another
> email.
Does the web interface show the status as CRITICAL? If you received a
recovery notification the service was considered to be OK. What did
you fix?
> This AM, I set notification_interval to 60 I should get an email
> every
> minute. I'm not. And, yes, I'm restarting nagios ;-)
>
> Here's the stanza in services.cfg:
>
> define service{
> use generic-service ; Name
> of service template to use
> host_name ftp
> service_description Disk Space
> is_volatile 0
> check_period normalbusinesshours
> max_check_attempts 3
> normal_check_interval 120
> retry_check_interval 10
> contact_groups FTP_Alerts
> notification_interval 60
> notification_period normalbusinesshours
> notification_options w,u,c,r
> check_command check_remote_disk1
> register 1
> }
Having notification_interval < normal_check_interval might be
problematic. I am under the distinct impression that notification
logic is only called after a check of the host/service. I don't have
convenient access to the source right now to verify though.
Additionally, this service is not set is_volatile (they normally are
not volatile). Nagios will only send a notification for it for a hard
state _change_ unless there is some other escalation definition
applied to it. This is normal.
> And I can check the remote system from the command line:
>
> [root at cerberus ~]# /usr/lib/nagios/plugins/check_nrpe -H ftp -c
> check_disk
> DISK OK - free space: / 2321 MB (1% inode=99%);|
> /=133114MB;142786;142796;0;142806
We'd have to see the actual command definition for check_disk from
nrpe.conf on the remote host but it seems that you've indicated that
1% free disk space is OK. Does it happen to be that you've specified
your warning and critical levels in KB, not %? That's an easy mistake
to make. Also, as a general rule you shouldn't test nagios plugins as
root. It's common, but not likely in this case, that you'll see
different results due to the difference in privilege levels between
nagios and root.
> Yes, I just noticed the discrepancy between contact_groups in
> services.cfg and hosts.cfg I doubt that's the issue, as I was getting
> emails yesterday.
It seems to me you're not receiving notifications because hard state
changes are not occurring. This is generally desired behavior.
--
Marc
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list