Memory leak
Arno Lehmann
al at its-lehmann.de
Mon May 16 19:34:44 CEST 2005
Hi,
first, excuse the crosspost - I asked at the user list some time ago,
but without a useful result.
Now, I've got a problem with Nagios 2.0b3 (with b2 as well, but I'm
trying b3 at the moment).
I noticed that the amount of used memory rises without end when Nagios
runs. I was able to find out the following:
- If I've got only one host, one service memory usage stays constant
- As soon as I add a second service, it goes up.
- The more services or hosts, or the higher the check frequency, the
faster the memory usage rises.
- No tool I know (like top or ps) can tell me where the memory goes (or
rather, which process it's used by).
- The memory usage does not go down as soon as I kill the Nagios
process, it can take between some hours and the next reboot. If I start
a process that requests more meory than physically available, i.e. I
force the system to swap, it gets freed.
- If I simply let the system run, the kernel out-of-memory reaper starts
killing processes, though.
I've got the following system:
> elf:~ # uname -a
> Linux elf 2.6.8-24.14-default #1 Tue Mar 29 09:27:43 UTC 2005 i686 athlon i386 GNU/Linux
> elf:/usr/local/nagios # bin/nagios etc/nagios-mini.cfg
>
> Nagios 2.0b3
> Copyright (c) 1999-2005 Ethan Galstad (www.nagios.org)
> Last Modified: 04-03-2005
> License: GPL
>
> Nagios 2.0b3 starting... (PID=20489)
> elf:~ # ldd /usr/local/nagios/bin/nagios
> linux-gate.so.1 => (0xffffe000)
> libm.so.6 => /lib/tls/libm.so.6 (0x4002e000)
> libnsl.so.1 => /lib/libnsl.so.1 (0x40051000)
> libpthread.so.0 => /lib/tls/libpthread.so.0 (0x40068000)
> libltdl.so.3 => /usr/lib/libltdl.so.3 (0x4007a000)
> libc.so.6 => /lib/tls/libc.so.6 (0x40081000)
> /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
> libdl.so.2 => /lib/libdl.so.2 (0x40197000)
so, no embedded Perl, I guess.
The system is a 500MHz athlon, 512 MB RAM, IDE disk which serves as an
all-purpose-server and works fine, so I'm quite sure the OS and the
hardware are more or less ok.
You find the configuration I use for testing below.
Now, I assume there is some sort of memory leak, either in Nagios itself
or in the kernel.
I don't think it's the plugins - first, I tried several, also some
simply shell script like 'echo OK; exit 0' and I verified them using
valgrind.
Using valgrind, I do get lots of output - unfortunately, I'm not a
programmer, so it is more or less impossible for me to understand that.
Seeing that Nagios is a very useful project and my good experiences with
version 1.x I'd reallylike to be able to upgrade to version 2, as well
as help getting it running on a wider range of systems.
Now, I assume that usually version 2.0b runs ok, because I see no other
problem reports. I'm wondering if anyone can give me some advice how to
solve these problems.
Of course, I can supply log files etc. or do test runs with different
configurations.
Arno
----------
Here's my current configuration:
> elf:~ # cat /usr/local/nagios/etc/nagios-mini.cfg
> log_file=/usr/local/nagios/var/nagios.log
> cfg_file=/usr/local/nagios/etc/mini.cfg
> object_cache_file=/usr/local/nagios/var/objects.cache
> resource_file=/usr/local/nagios/etc/resource.cfg
> status_file=/usr/local/nagios/var/status.dat
> nagios_user=nagios
> nagios_group=nagios
> command_check_interval=30s
> command_file=/usr/local/nagios/var/rw/nagios-test.cmd
> comment_file=/usr/local/nagios/var/comments.dat
> downtime_file=/usr/local/nagios/var/downtime.dat
> lock_file=/usr/local/nagios/var/nagios.lock
> temp_file=/usr/local/nagios/var/nagios.tmp
> log_rotation_method=d
> log_archive_path=/usr/local/nagios/var/archives
> use_syslog=0
> log_notifications=1
> log_service_retries=1
> log_host_retries=1
> log_event_handlers=1
> log_initial_states=1
> log_external_commands=1
> log_passive_checks=1
> service_inter_check_delay_method=s
> max_service_check_spread=60
> service_interleave_factor=s
> host_inter_check_delay_method=s
> max_host_check_spread=60
> max_concurrent_checks=20
> service_reaper_frequency=2
> service_check_timeout=30
> host_check_timeout=60
> event_handler_timeout=30
> notification_timeout=60
> ocsp_timeout=5
> perfdata_timeout=5
> retain_state_information=1
> state_retention_file=/usr/local/nagios/var/retention.dat
> retention_update_interval=120
> use_retained_program_state=1
> use_retained_scheduling_info=1
> interval_length=2
> use_aggressive_host_checking=0
> execute_service_checks=1
> accept_passive_service_checks=0
> execute_host_checks=1
> accept_passive_host_checks=0
> enable_notifications=1
> enable_event_handlers=0
> process_performance_data=0
> obsess_over_services=0
> obsess_over_hosts=0
> check_for_orphaned_services=0
> check_service_freshness=0
> service_freshness_check_interval=300
> check_host_freshness=0
> host_freshness_check_interval=1500
> aggregate_status_updates=0
> status_update_interval=24
> enable_flap_detection=0
> low_service_flap_threshold=5.0
> high_service_flap_threshold=20.0
> low_host_flap_threshold=5.0
> high_host_flap_threshold=20.0
> date_format=strict-iso8601
> illegal_object_name_chars=`~!$%^&*|'"<>?,()=
> illegal_macro_output_chars=`~$&|'"<>
> use_regexp_matching=0
> use_true_regexp_matching=0
> admin_email=its-admin at its-lehmann.de
> admin_pager=<nicht vorhanden>
> daemon_dumps_core=0
> elf:~ # cat /usr/local/nagios/etc/mini.cfg
> define command{
> command_name check-host-alive
> # command_line sudo -u root $USER1$/check_icmp -H $HOSTADDRESS$ -w 300.0,30% -c 500.0,70% -p 10
> command_line $USER1$/check_dummy 0 Immer_ok_dafür_sorg_ich_schon
> }
>
> define command{
> command_name check_dhcp
> command_line sudo -u root $USER1$/check_dhcp --serverip=$ARG1$
> }
>
> define command{
> command_name check_local_disk
> command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
> }
>
>
> define command{
> command_name nix
> command_line /bin/true
> }
>
> define host{
> host_name Elf
> alias Elf
> address 192.168.0.4
> check_command check-host-alive
> max_check_attempts 2
> check_interval 12
> check_period 24x7
> contact_groups admins
> notification_interval 22
> notification_period 24x7
> notification_options d,u,r,f
> }
>
> define hostgroup{
> hostgroup_name alles
> alias Alles
> members Elf
> }
>
> #Using both checks results in an increasing memory usage.
>
> #If I use this service alone there's no increase in MemUsage
> #define service{
> # host_name Elf
> # service_description DHCP
> # check_command check_dhcp!192.168.0.4
> # max_check_attempts 2
> # normal_check_interval 1
> # retry_check_interval 1
> # check_period 24x7
> # notification_interval 22
> # notification_period 24x7
> # notification_options w,u,c,r,f
> # contact_groups admins
> #}
>
> #This one alone is ok.
> define service{
> host_name Elf
> service_description DISK
> check_command check_local_disk!10%!5%!/
> max_check_attempts 2
> normal_check_interval 1
> retry_check_interval 1
> check_period 24x7
> notification_interval 22
> notification_period 24x7
> notification_options w,u,c,r,f
> contact_groups admins
> }
>
> define service{
> host_name Elf
> service_description DISK2
> check_command check_local_disk!10%!5%!/tmp
> max_check_attempts 2
> normal_check_interval 1
> retry_check_interval 1
> check_period 24x7
> notification_interval 22
> notification_period 24x7
> notification_options w,u,c,r,f
> contact_groups admins
> }
>
>
> define contactgroup{
> contactgroup_name admins
> alias Administrators
> members admin
> }
>
>
> define contact{
> contact_name admin
> alias Admins
> email admin at elf
> host_notification_period 24x7
> service_notification_period 24x7
> host_notification_options d,u,r,f,n
> service_notification_options w,u,c,r,f,n
> service_notification_commands nix
> host_notification_commands nix
> }
>
> define timeperiod{
> timeperiod_name 24x7
> alias Always
> sunday 00:00-24:00
> monday 00:00-24:00
> tuesday 00:00-24:00
> wednesday 00:00-24:00
> thursday 00:00-24:00
> friday 00:00-24:00
> saturday 00:00-24:00
> }
--
IT-Service Lehmann al at its-lehmann.de
Arno Lehmann http://www.its-lehmann.de
-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list