Unpredictable service check times fixed?
Stanley Hopcroft
Stanley.Hopcroft at IPAustralia.Gov.AU
Tue Apr 15 05:14:30 CEST 2003
Dear Sir,
I am writing to say that it seems to me that the delays in checking
services are caused by Nagios not being able to fork() itself to perform
the service checks.
This fork failure is the cause of the problem, it has nothing to do with
Nagios.
When Nagios wants to run a check it calls the system (kernel) fork()
function to generate a new process that can then execve() the service
check program. If Nagios cannot get a new process returned by fork(), it
cannot check the service, simple as that.
Fork can fail for a number of mainly resource related reasons such as
(on this FreeBSD system, happily checking 350 services)
ERRORS
Fork() will fail and no child process will be created if:
[EAGAIN] The system-imposed limit on the total number of pro-
cesses under execution would be exceeded. The limit
is given by the sysctl(3) MIB variable KERN_MAXPROC.
(The limit is actually ten less than this except for
the super user).
[EAGAIN] The user is not the super user, and the
system-imposed limit on the total number of processes
under execution
by a single user would be exceeded. The limit is
given by the sysctl(3) MIB variable
KERN_MAXPROCPERUID.
[EAGAIN] The user is not the super user, and the soft resource
limit corresponding to the resource parameter
RLIMIT_NPROC would be exceeded (see getrlimit(2)).
[ENOMEM] There is insufficient swap space for the new process.
The problem is I think in your Nagios host having either
. insufficient memory and or swap. Your host may simply be overcommitted
with other applications. If you are running ntop or snort or an SQL DB,
you may have to get rid of them or upgrade your host.
. unpriviledged user resource limits
Some of these limits can be changed dynamincally, others may require a
kernel rebuild.
You probably should consult a local system administrator that is
familiar with tuning whatever OS Nagios is running under.
This sytem (FreeBSD 4.7, 256 MB RAM and an 866 MHz Celeron) with Nag,
Apache, smslink, sendmail) only runs at a load average of 0.15.
Hope this helps.
Yours sincerely.
--
------------------------------------------------------------------------
Stanley Hopcroft
------------------------------------------------------------------------
'...No man is an island, entire of itself; every man is a piece of the
continent, a part of the main. If a clod be washed away by the sea,
Europe is the less, as well as if a promontory were, as well as if a
manor of thy friend's or of thine own were. Any man's death diminishes
me, because I am involved in mankind; and therefore never send to know
for whom the bell tolls; it tolls for thee...'
from Meditation 17, J Donne.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list