check_nrpe socket time
Carroll, Jim P [Contractor]
jcarro10 at sprintspectrum.com
Thu Dec 5 22:51:08 CET 2002
If the NRPE problems were consistent on any given host, no matter what other
NRPE checks (to other hosts) I was adding in, then I would certainly be
suspect of said host. But that's not the case; the timeouts were seemingly
random across all hosts. And the Nagios host itself was starting to bog
down horribly. I could ping it from elsewhere, but couldn't login to it via
ssh. (I admit that I didn't try to login to the console.) Previously open
ssh sessions closed as well (timeouts). It seemed capable of kicking out
occasional e-mails (notifications), but that's it. Even Apache had stopped
responding.
Some history:
- initial install of Nagios on old PC of questionable power
- created initial checks
- began implementation of NRPE as standalone daemon on several Solaris hosts
- added NRPE checks
- started to notice NRPE timeouts
- switched from standalone to inetd
- NRPE timeouts seemed to persist
- stopped rollout of NRPE
- upgraded Nagios server
- added hosts, added service checks, added NRPE checks
- at 800+ total service checks (including NRPE), everything is well
- added NRPE to several Linux hosts (via xinetd) one afternoon
- total service checks are now 1100+
- started to experience NRPE timeouts
... and the rest I've already mentioned.
jc
> -----Original Message-----
> From: Ethan Galstad [mailto:nagios at nagios.org]
> Sent: Wednesday, December 04, 2002 8:26 PM
> To: nagios-users
> Subject: RE: [Nagios-users] check_nrpe socket time
>
>
> Whoops - sent the original reply to the devel list on accident.
>
> ------- Forwarded message follows -------
>
> How is nrpe being run on the remote host? Via inetd, xinetd, or as a
> standalone daemon. If under xinetd, it could be that you've hit some
> kind of limit based on your xinetd config (per_source and max_load
> directives come to mind) - check the man pages for xinetd.conf(5) for
> more info.
>
> The Nagios host may be causing excessive load (CPU/MEM/SWAP) because
> several child processes are waiting for the check_nrpe plugin to
> finish before they can exit. Sounds like the nrpe daemon might be
> backlogged on connections, which might point to some tweaking needed
> on the remote host side.
>
>
> On 4 Dec 2002 at 13:24, Carroll, Jim P [Contractor] wrote:
>
> > Yes! I made this observation/complaint on the list a while
> back, back when
> > I had nagios installed on an underpowered old PC. Nobody
> had a comment to
> > make.
> >
> > Since then (much more recently), I've had it happen again.
> This was when I
> > added quite a few NRPE checks across several Linux boxen,
> bumping my total
> > service checks from 800+ to 1100+.
> >
> > Here are things I've done since that time:
> >
> > - posted a question to the list: scalability of NRPE vs. NSCA
> > - set max_concurrent_checks to 200
> > - split the software disk mirror (I/O was getting hammered)
> > - increased swap (from 50% of RAM to 200% of RAM)
> > - set max_concurrent_checks to 0
> > - noticed that while NRPE checks didn't fail, system would
> occasionally be
> > very slow
> > - set max_concurrent_checks to 400
> >
> > I still haven't had any response to my
> scalability/NRPE/NSCA query on this
> > list. I haven't ruled out NSCA as possibly a better way to
> go. It just
> > means cobbling some scripts together. If I knew for
> certain that the NSCA
> > approach is an order of magnitude more scalable than using
> NRPE, I'd jump on
> > it in a heartbeat.
> >
> > BTW, the docs have suggestions for improving overall
> performance. The one
> > suggestion which stuck in my mind was to get
> /usr/local/nagios/var onto a
> > ramdisk. I don't think that would help in my situation,
> but I do have the
> > option of putting it over on another spindle (the former mirror).
> >
> > Let me know if any of my observations/suggestions help you out.
> >
> > jc
> >
> > > -----Original Message-----
> > > From: Kaplan, Andrew H. [mailto:AHKAPLAN at PARTNERS.ORG]
> > > Sent: Wednesday, December 04, 2002 11:16 AM
> > > To: nagios-users at lists.sourceforge.net
> > > Subject: [Nagios-users] check_nrpe socket timeout
> > >
> > >
> > > I've been periodically encountering check_nrpe socket timeout
> > > errors for
> > > some time. Further checks into the status
> > > of the systems where this is occurring does not show any
> > > apparent problems.
> > > Has anyone had similar occurrences?
> > >
>
> ------- End of forwarded message -------
>
> Ethan Galstad,
> Nagios Developer
> ---
> Email: nagios at nagios.org
> Website: http://www.nagios.org
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Microsoft Visual Studio.NET
> comprehensive development tool, built to increase your
> productivity. Try a free online hosted session at:
> http://ads.sourceforge.net/cgi-bin/redirect.pl?micr0003en
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
>
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
More information about the Users
mailing list