Distributed monitoring problem
Rob Hassing
nagios-users at redcap.nl
Wed Dec 21 12:08:00 CET 2005
Hello all,
I'm trying to setup a distributed monitoring system.
At the start all looked fine too me, but now I'm having some problems on
not receiving all passive checks from other hosts.
The machine is a Intel(R) Xeon(TM) CPU 2.40GHz system with 512 MB RAM.
The load is minimal. The only strange thing I can see is the memory settings:
nagios:/etc/nagios # cat /proc/meminfo
MemTotal: 514264 kB
MemFree: 30192 kB
Buffers: 44568 kB
Cached: 328004 kB
SwapCached: 8 kB
Active: 264908 kB
Inactive: 137824 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 514264 kB
LowFree: 30192 kB
SwapTotal: 1028120 kB
SwapFree: 1028020 kB
Dirty: 780 kB
Writeback: 0 kB
Mapped: 46188 kB
Slab: 75556 kB
Committed_AS: 100992 kB
PageTables: 1104 kB
VmallocTotal: 507896 kB
VmallocUsed: 7264 kB
VmallocChunk: 499760 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 4096 kB
The process info tells me this:
Time Frame Checks Completed
<= 1 minute: 51 (16.6%)
<= 5 minutes: 221 (71.8%)
<= 15 minutes: 255 (82.8%)
<= 1 hour: 260 (84.4%)
Since program start: 261 (84.7%)
So it's receiving less then 85% of all checks :(
There will be more passive checks to be send to this nagios server.
Do we need other hardware ?
Where do I need to look to solve this problem ?
The machines sending the passive check info are not too busy doing this,
the checks are seperated over three different servers.
One example...
This is /var/log/nagios/nagios.log:
[1135162484] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;cat29-w11-backup;PING;0;PING OK - Packet loss
= 0%, RTA = 0.89 ms[1135162491] SERVICE ALERT:
cat29-w11-backup;PING;OK;HARD;3;PING OK - Packet loss = 0%, RTA = 0.89 ms
[1135162491] SERVICE NOTIFICATION:
nagios;cat29-w11-backup;PING;OK;notify-by-epager;PING OK - Packet loss =
0%, RTA = 0.89 ms[1135162491] SERVICE NOTIFICATION:
nagios;cat29-w11-backup;PING;OK;notify-by-email;PING OK - Packet loss =
0%, RTA = 0.89 ms
[1135162941] Warning: The results of service 'PING' on host
'cat29-w11-backup' are stale by 32 seconds (threshold=425 seconds). I'm
forcing an immediate check of the service.
[1135162951] SERVICE ALERT:
cat29-w11-backup;PING;CRITICAL;SOFT;1;CRITICAL: Service results are stale!
It looks like its stale again too fast ?
Can somebody please help me :)
Best regards,
Rob Hassing
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list