RFC Proof of concenpt patch: Restarting embedded Perl Nagios periodically to halt memory consumption.
Stanley Hopcroft
Stanley.Hopcroft at IPAustralia.Gov.AU
Sat Sep 18 04:31:07 CEST 2004
Dear Ladies and Gentlemen,
Nag 2.x attempts unsuccessfully (on my bad advice) to limit the maximum
memory used by the embedded Perl Nag (ePN) process by periodically
deallocating the Perl interpreter and re-initialising it.
Since 1.2 is my Nag test bed, these changes were backported to it and
the negative results noted in a former letter.
However, changes to the reinit mechanism used by 2.x appear to deal with
the problem of increasing memory usage by an ePN by _restarting_ Nagios
periodically.
The changes are
1 In utils.c/reinit_embedded_perl(void)
fork, and in the child process exec the Nag startup script with the the
'restart' parameter.
int reinit_embedded_perl(void){
#ifdef EMBEDDEDPERL
char buffer[MAX_INPUT_BUFFER];
pid_t pid ;
snprintf(buffer,sizeof(buffer),"Restarting Nagios (to
re-initialize embedded Perl interpreter) after %d uses
...\n",embedded_perl_calls);
buffer[sizeof(buffer)-1]='\x0';
write_to_logs_and_console(buffer,NSLOG_INFO_MESSAGE,TRUE);
pid=fork();
if(pid==-1)
exit(STATE_UNKNOWN) ;
else if(pid==0){
execlp("/usr/local/etc/rc.d/nagios.sh",
"/usr/local/etc/rc.d/nagios.sh", "restart", 0) ;
} else {
exit(STATE_OK) ;
}
#endif
return OK ;
}
2 Make the Nag startup script suid root.
2.1 minor changes to the startup script (to remove the su) and have the
startup script append debug output to a file.
As with the 2.x code, reinit_embedded_perl() is called in checks.c
whenever the number of calls to the embedded interpreter exceeds a
threshold value.
It may well be that the restart is better done by the daemon process,
rather than in a child forked to perform a service check. (This way
seemed to me to be the fastest way to proceed [since there was already
2.x code with this structure)].
Here is an extract from the Nagios log showing some test results
[1095429760] Restarting Nagios (to re-initialize embedded Perl
interpreter) after 101 uses ...
[1095429760] Caught SIGTERM, shutting down...
[1095429760] Nagios 1.2 starting... (PID=83831)
[1095429760] Successfully shutdown... (PID=81306)
[1095429760] Finished daemonizing... (New PID=83832)
[1095430344] Restarting Nagios (to re-initialize embedded Perl
interpreter) after 101 uses ...
[1095430344] Caught SIGTERM, shutting down...
[1095430344] Successfully shutdown... (PID=83832)
[1095430344] Nagios 1.2 starting... (PID=86358)
[1095430344] Finished daemonizing... (New PID=86359)
I am now testing my prod Nag with this change and a threshold of 100_000
checks (should be about a week or a mem usage of 40-60 MB).
Yours sincerely.
--
Stanley Hopcroft
Network specialist, IT Infrastructure
IP Australia
Ph: (02) 6283 3189 Fax: (02) 6281 1353
PO Box 200 Woden ACT 2606
http://www.ipaustralia.gov.au
-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
More information about the Developers
mailing list