Nagios Hang?
Marc Powell
marc at ena.com
Wed Feb 15 17:20:50 CET 2006
> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of Mike Koponick
> Sent: Wednesday, February 15, 2006 10:10 AM
> To: Nagios Users
> Subject: [Nagios-users] Nagios Hang?
>
>
>
> I'm running Nagios 2.0 (Stable) on Redhat 9.0, in a distributed
> environment. I'm utilizing NSCA for checks and all appears to be
working
> properly.
>
>
>
> I'm running into several issues that seemed to have "started all of a
> sudden".
>
>
>
> 1) On my distributed server, I don't see syslog messages any
longer,
> with the exception of "INITIAL SERVICE STATE" messages. Syslog is
working,
> and in the nagios.cfg file, "nagios.cfg:use_syslog=1" I used to see
all
> the check messages, etc. Nothing in the configuration has changed to
the
> best of my knowledge.
>
Make sure you haven't run out of disk space. Verify your log_ settings
in nagios.cfg.
>
> 2) Nagios appears to "hang" on the remote sensor. Once I receive
> notifications that network devices are down, I never see a recovery of
the
> network devices, even though they are recovered. The work around is to
> restart nagios with "service nagios restart". Sometimes, this takes
> multiple tries.
Could be related to multiple nagios processes as below. One daemon sees
the down and another sees the up. What have you verified so far? I'd
check disk space, use strace to see what the daemon is doing, turn up
logging as much as possible for both nagios and nsca and watch the logs.
> 3) When I have a massive network outage, I receive the
appropriate
> alerts but I receive multiple "PROBLEM" notifications. I'm only using
> service checks (I'm only using check_ping currently) and the
> notification_interval set to "0", which according to the documentation
> should limit the amount of messages I'm receiving to "1", unless I'm
using
> the service escalations, which I am not at this time. I am not
receiving
> multiple notifications for "OK" messages, which is what I would
expect.
Without seeing any example host and service config information this
sounds very much like you might have multiple nagios daemons running at
the same time. Stop nagios, make sure they're _all_ stopped and restart
nagios.
--
Marc
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list