fork issues and latency
Thomas Guyot-Sionnest
dermoth at aei.ca
Sat Feb 14 17:08:51 CET 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 12/02/09 12:27 PM, Jeff Frost wrote:
> I've got a Nagios-3.0.4 server monitoring 3,290 services on 387
> hosts. When the nagios service is initially started, service and host
> latency is great. This usually continues for about 2-3 hours and then
> we start seeing fork errors in the log like so:
>
> [1234425582] Warning: The check of service 'ssh' on host 'mail02' could
> not be performed due to a fork() error: 'Cannot allocate memory'. The
> check will be rescheduled.
>
> At about the same time, we start seeing lots of orphaned
> /tmp/checkXXXXXX files and indications that the max concurrent checks
> value has been reached:
>
> [1234458853] Max concurrent service checks (500) has been reached.
> Delaying further checks until previous checks are complete...
>
> It should be noted that during this time period, there is 2GB of free
> memory and 1.2GB of cache available out of the 4GB on the nagios server,
> so I'm thinking it has to be something besides system RAM that's exhausted.
>
> Naturally, when this starts happening, the latencies begin to increase
> and seem to settle somewhere around 98seconds and interestingly enough,
> this causes the load to drop to nearly nothing.
>
> We have already set the following in nagios.cfg:
>
> service_reaper_frequency=2
> use_large_installation_tweaks=1
> enable_environment_macros=0
>
> If we enable the embedded perl interpreter, the forking issues happen
> much more quickly after restart (minutes instead of hours).
Which OS/distribution are you running? How much RAM do you have? Free
RAM? SWAP?
Please send results of "free -m" with and without Nagios running.
Also send the RSS size of the Nagios process after start, and once you
get the fork errors.
Nagios 3 is leaking some memory, especially when using the ePN. However
unless your server is really short on RAM it shouldn't be a huge problem.
If you're stuck with low-end hardware make sure to run the server
without its graphical interface and disable as many daemons as possible.
A slim Linux distribution like Slackware (if you use Linux) could also
help. Another setting that could help is limiting check parallelization,
though it was reported that there may be a problem with it on Nagios3
(it hasn't been confirmed AFAIK).
- --
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFJluyT6dZ+Kt5BchYRAo67AKCGGhi+EzKbxNvkMuzOkYOqsQDG3ACgqIG9
9jlBUwg6O2pM6vWA7qQdNTs=
=l5Hz
-----END PGP SIGNATURE-----
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list