Separate mail server problems cause Nagios to plotz (or vice versa?)
up at 3.am
up at 3.am
Fri Jun 24 21:45:40 CEST 2011
> On Fri, Jun 24, 2011 at 11:53, <up at 3.am> wrote:
>>> Quoting up at 3.am:
>>>
>>>> We have Nagios monitoring a variety of services on roughly 50
>>>> separate servers. Several of them
>>>> are mail servers, but only the "main" (that contains most of the
>>>> Nagios notification recipients)
>>>> one has this problem.
>>>>
>>>> The mail server will start to become unresponsive so just about any
>>>
>>>> input (but pings fine).
>>>
>>> This is a mail server issue. You would need to determine exactly what
>>> process(es) have become unresponsive and why.
>>
>> We're still trying to figure that out...but the question for this list
>> is why Nagios would go nuts.
>
> Do you have any staleness stuff on the tests that go nuts?
>
> Is it possible to place many of the sendmail tests (ie if you're
> checking mqueue) as dependencies of another test (such as "is it
> responding to port 25?") so that when the sendmail gets strange, at
> least many of the tests are then skipped?
The only sendmail specific test we use for nagios is the simple SMTP test.
>>>> Simultaneously, Nagios, which is on a separate server, will send
>>> out
>>>> notifications that every
>>>> service on every server is down because Nagios cannot reach them.
>>>
>>>
>>> Why can't it reach them? Is your mail server also your router?
>>
>> Good Gosh, no! Â That's why this is so puzzling.
>
> re: staleness above: can you watch your Nagios log, perhaps filtering
> it through awk to add a timestamp to each entry, just spool that on a
> terminal, and when things get strange and Nagios goes nuts, is Nagios
> at least running the tests and getting responses?
I'll try to grock something out of the nagios logs, but this is one of those problems that happens
every few days, so it's hard to monitor it constantly (monitor the monitoring software?!).
> You mention LDAP; is your sendmail server also your LDAP server, and
> is the Nagios host also using LDAP to resolve basic OS features like
> UID?
Yes, it is the LDAP server, but it is not used for DNS...it is only used for
user authentication.
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense..
http://p.sf.net/sfu/splunk-d2d-c1
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list