nsca / xinetd "Failed to contact identity server"

Michael J McCafferty mike at m5computersecurity.com
Wed May 25 00:12:50 CEST 2005

Previous message: Nagios web control page
Next message: FW: Help Please "HTTP WARNING: HTTP/1.1 400 No Host matches server name nagios"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

  I built a distributed Nagios setup for a customer. It monitors about 200
hosts. The central Nagios server monitors several dozen hosts actively, and the
distributed Nagios servers monitor the rest. The distributed Nagios servers are
due to network architecture, not for capacity. Almost all checks are done every
minute. The distributed Nagios servers send their check results back to the
central Nagios host via NSCA, using xinetd to listen for inbound connections
from the remote Nagios hosts.
   With the exception of having to increase the connections per second and the
number of instances for xinetd some time ago (upped it to 60 connections), I
have had no problems with the setup... until today.
   Today I updated the kernel and rebooted. When the system came back up the
central Nagios server said there where hosts down on one of the remote networks
which are checked by the remote Nagios servers. Upon further investigation I
realized that they had been "down" (actually it was new FW rule that made them
appear to be down cuz they can't be pinged anymore) for some time (6 days), but
were just now being reported down by Nagios. The exact moment of the last
successful receipt of data from the remote Nagios servers was the exact moment
the following message began appearing in /var/log/messages:


May 16 10:48:46 nagioshost xinetd[21405]: Failed to contact identity server a
t 172.16.0.1: timeout
May 16 10:48:48 nagioshost xinetd[21406]: Failed to contact identity server at
192.168.1.2: timeout
May 16 10:48:49 nagioshost xinetd[21407]: Failed to contact identity server at
10.0.0.1: timeout


These messages appear every few seconds from the time of the last successful
receipt of data from the distributed Nagios servers (6days ago) until the reboot
today. The IP addresses in the messages are the IP addresses of the remote
Nagios servers. No FW rules have changed to cause this. I see no outbound port
113 (identd) traffic at the main Nagios server. I am sure this is a xinetd
issue...

My questions are:

1) What does this error message mean.
2) What broke ?
3) How do I keep it from breaking again ?

   In the meantime I set up a check_log check to see if the message appears in
the messages file again.


Thanks,
Mike


-- 
************************************************************ 
Michael J. McCafferty 
Principal, Security Engineer 
M5 Hosting
858-576-7325 Voice 
http://www.m5hosting.com 
************************************************************

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Previous message: Nagios web control page
Next message: FW: Help Please "HTTP WARNING: HTTP/1.1 400 No Host matches server name nagios"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Users mailing list