Nagios scheduling queue seems to be laggingbehind real time
tom.welsh at bt.com
tom.welsh at bt.com
Fri May 28 14:26:55 CEST 2004
Hi,
Only the main nagios process is running but lots of child processes run when checks are being executed eg .....
ps -eaf | grep nagios
pms 21318 1 1 08:50:08 ? 3:40 /opt/pms/nagios/bin/nagios -d /opt/pms/nagios/etc/nagios.cfg
pms 822 344 0 13:24:25 pts/18 0:00 grep nagios
pms 806 805 0 13:24:22 ? 0:00 sh -c /opt/pms/nagios/libexec/check_ping -H 172.16.118.103 -w 1000,70% -c 1000,
pms 683 1 0 13:24:17 ? 0:00 /opt/pms/nagios/bin/nagios -d /opt/pms/nagios/etc/nagios.cfg
pms 691 1 0 13:24:17 ? 0:00 /opt/pms/nagios/bin/nagios -d /opt/pms/nagios/etc/nagios.cfg
pms 766 1 0 13:24:18 ? 0:00 /opt/pms/nagios/bin/nagios -d /opt/pms/nagios/etc/nagios.cfg
pms 807 806 0 13:24:22 ? 0:00 /opt/pms/nagios/libexec/check_ping -H 172.16.118.103 -w 1000,70% -c 1000,80% -p
pms 538 1 0 13:24:15 ? 0:00 /opt/pms/nagios/bin/nagios -d /opt/pms/nagios/etc/nagios.cfg
pms 547 1 0 13:24:15 ? 0:00 /opt/pms/nagios/bin/nagios -d /opt/pms/nagios/etc/nagios.cfg
pms 754 1 0 13:24:17 ? 0:00 /opt/pms/nagios/bin/nagios -d /opt/pms/nagios/etc/nagios.cfg
pms 805 21318 0 13:24:22 ? 0:00 /opt/pms/nagios/bin/nagios -d /opt/pms/nagios/etc/nagios.cfg
pms 707 1 0 13:24:17 ? 0:00 /opt/pms/nagios/bin/nagios -d /opt/pms/nagios/etc/nagios.cfg
pms 555 1 0 13:24:15 ? 0:00 /opt/pms/nagios/bin/nagios -d /opt/pms/nagios/etc/nagios.cfg
pms 602 1 0 13:24:16 ? 0:00 /opt/pms/nagios/bin/nagios -d /opt/pms/nagios/etc/nagios.cfg
pms 610 1 0 13:24:16 ? 0:00 /opt/pms/nagios/bin/nagios -d /opt/pms/nagios/etc/nagios.cfg
I restarted the nagios service at 08:50:00 so i believe that I have only 1 process running.
Cheers
Tom
-----Original Message-----
From: Thales Maia [mailto:tchagas at uolinc.com]
Sent: 28 May 2004 13:05
To: Welsh,T,Tom,XJH2A C
Cc: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Nagios scheduling queue seems to be
laggingbehind real time
Another trick: Maybe more than 1 main nagios process is running.
On Fri, 2004-05-28 at 04:49, tom.welsh at bt.com wrote:
> Hi All
>
> I'm new to this list so try to be kind :)
>
> OS. Solaris with current patches applied
>
> Hardware : Sun E220R, 2 * 450MHz Ultra SPARCII cpu's, 1gb ram, 2 * 18gb SCSi drives
>
> Monitored Hosts: 64
>
> Monitored Services 2094
>
> Monitoring interval 5 mins
>
> Nagios version 1.0 ( Yes I know there is a new version but an upgrade is not an option for us just now :( )
>
> Problem:
> nagios appears to be running fine it is just that when I look at my scheduling queue the tests at the top of the queue seem to be constantly 16 - 17 mins behind real time. Currently time on my box is 08:40. The next entry to run should have been executed at 08:24.
>
> I have read the section in the docs regarding scheduling and adjusted my max_concurrent_checks accordingly. My service_reaper_frequency is still set to 10
>
> Can any one point out where I can make changes to bring this system back into line with real time?
>
> Here is the output from nagios -s ../etc/nagios.cfg
>
>
> SERVICE SCHEDULING INFORMATION
> -------------------------------
> Total services: 2094
> Total hosts: 64
>
> Command check interval: -1 sec
> Check reaper interval: 10 sec
>
> Inter-check delay method: SMART
> Average check interval: 341.117 sec
> Inter-check delay: 0.163 sec
>
> Interleave factor method: SMART
> Average services per host: 32.719
> Service interleave factor: 33
>
> Initial service check scheduling info:
> --------------------------------------
> First scheduled check: 1085729805 -> Fri May 28 08:36:45 2004
> Last scheduled check: 1085730148 -> Fri May 28 08:42:28 2004
>
> Rough guidelines for max_concurrent_checks value:
> -------------------------------------------------
> Absolute minimum value: 62
> Recommend value: 186
>
>
> From nagios.cfg...
>
> inter_check_delay_method=s
> service_interleave_factor=s
> max_concurrent_checks=186
> service_reaper_frequency=10
> sleep_time=1
> service_check_timeout=63
> host_check_timeout=30
> event_handler_timeout=30
> notification_timeout=30
> ocsp_timeout=5
> perfdata_timeout=5
>
>
> Thanks for your help
>
> Regards,
>
>
> Tom
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: Oracle 10g
> Get certified on the hottest thing ever to hit the market... Oracle 10g.
> Take an Oracle 10g class now, and we'll give you the exam FREE.
> http://ads.osdn.com/?ad_id149&alloc_id66&op=click
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
--
THALES MAIA CHAGAS
Sysadmin - UOL S/A
-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id149&alloc_id66&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list