NSCA strange behaviour

Greg Pangrazio pangrazi at gmail.com
Tue Dec 8 15:08:03 CET 2009


Do all of your clients fail, or just the new one?

Greg Pangrazio
pangrazi at gmail.com




On Tue, Dec 8, 2009 at 4:40 AM, Cedric Jeanneret
<cedric.jeanneret at camptocamp.com> wrote:
> Hello,
>
> I'm having troubles with NSCA.
> What we have :
>
> - about 47 passive hosts
> - about 220 passive services
>
> Versions : all are redhat servers, with:
> - NSCA 2.7.2 (latest one)
> - Nagios 3.1.2
>
> We have a single "nagios aggregator", which collect all NSCA status from the other hosts.
>
> What's happening:
> a host was reinstalled yesterday (say client22), and now it seems NSCA daemon on the aggregator (say server01) doesn't seem to collect data.
>
> What I've done:
>
> - tcpdump on both client22 and server01, both show me traffic between them, on NSCA default port (5667)
>
> - checked iptables rules, all is ok (as tcpdump shows me traffic, that's a confirmation)
>
> - trying to push status by hand from client22 to server01; ALL packets are sent successfully """1 data packet(s) sent to host successfully.""". I've done this with a loop like that:
> for i in $(seq 1000); do /usr/local/bin/submit_ochp $(hostname -f) UP 'Host is up'; sleep 2; done
>
> - Enbling debug for nsca on server01 doesn't show me anything interesting. I just don't see where nsca catch up client22 status, and it keeps on saying :
> Warning: The results of host 'client22.domain.lt' are stale by 0d 0h 2m 0s (threshold=0d 0h 6m 0s).  I'm forcing an immediate check of the host.
>
> On another hand, it shows me:
> [1260267958.216051] [016.1] [pid=23191] Check results for service 'Cron service' on host 'client22.domain.lt' are fresh.
>
>
> I really don't know where to find a solution, neither where is the real problem. We have another network with about 200 passive hosts and over 350 passive services, and it works fine.
>
> The only differences are :
> - the working network is debian-only
> - the working network's NSCA server doesn't do anything else than central nagios server. server01 does some other stuff, like syslog server and collectd server... maybe there's a bottleneck in there, but I can't be sure about that.
>
> Does anyone of you have an idea ?
>
> Thank you in advance.
>
> Best regards,
>
> C.
>
>
>
> --
> Cédric Jeanneret                 |  System Administrator
> 021 619 10 32                    |  Camptocamp SA
> cedric.jeanneret at camptocamp.com  |  PSE-A / EPFL
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
>
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list