Is my nagios in good shape?
Cook, Garry
GWCOOK at mactec.com
Mon Aug 2 17:22:32 CEST 2004
Neil wrote:
> Hi Garry,
>
> Thank you very much. Btw, here is my new performance info after the
> changes:
>
> Time Frame Checks Completed
> <= 1 minute: 59 (20.6%)
> <= 5 minutes: 208 (72.7%)
> <= 15 minutes: 277 (96.9%)
> <= 1 hour: 286 (100.0%)
> Since program start: 286 (100.0%)
>
> Metric Min. Max. Average
> Check Execution Time: < 1 sec 16 sec 2.245 sec
> Check Latency: 20 sec 81 sec 49.769 sec
> Percent State Change: 0.00% 14.80% 0.07%
>
> What do you think about the new result?
Well, you've trimmed the check latency quite a bit from your original
post. However, your checks are still run an average of (almost) 1 minute
behind schedule.
It looks like you are running about 300 checks. My implementation is
currently checking about 700 services, and my Performance Info looks
like this:
Metric Min. Max. Average
Check Execution Time: < 1 sec 23 sec 2.534 sec
Check Latency: < 1 sec 9 sec 0.153 sec
Percent State Change: 0.00% 5.86% 0.01%
Checks on average are run less than 1 second behind schedule. A minute
behind, like you show above, may not seem like much, although it can
really be a factor when dealing with mission critical hosts/services.
This issue could also be due to issues on the Nagios server itself...
Perhaps the load on the box is too high, or you're out of memory and
swapping a lot? Maybe the Nagios server is too far away from the boxes
you are checking, and actual link latency is a problem? If you don't
think that any of these suggestions are getting in the way, then I would
continue to read through the docs, there are quite a few options in
nagios.cfg that can help to trim check latency. Keep tweaking it until
you get it right, and after every tweak run 'nagios -s nagios.cfg' again
to see what Nagios thinks of your changes.
If you don't have any issues with the Nagios server, you might want to
try running Nagios with the max_concurrent_checks value set to 0, which
does not restrict the number of checks that can be run at one time.
> Cook, Garry writes:
>
>> Resending message as per your request:
>>
>> You should take a look at the max_concurrent_checks value in your
>> Nagios config. The output below recommends some values to use. After
>> making the change and restarting Nagios, give it some time to run
>> some checks and have a look at your performance info again.
>>
>> Also, have a look at the following docs:
>>
> http://nagios.sourceforge.net/docs/1_0/configmain.html#max_con
> current_ch
>> ecks
>> http://nagios.sourceforge.net/docs/1_0/checkscheduling.html
>>
>> HTH
>>
>> -g
>>
>> -----Original Message-----
>> From: Neil [mailto:neil-on-nagios at restricted.dyndns.org]
>> Sent: Wed 7/28/2004 2:38 PM
>> To: Cook, Garry
>> Cc: nagios-users at lists.sourceforge.net
>> Subject: Re: Is my nagios in good shape?
>>
>>
>>
>> Hi Garry,
>>
>> This is the result:
>> SERVICE SCHEDULING INFORMATION
>> -------------------------------
>> Total services: 276
>> Total hosts: 115
>>
>> Command check interval: 5 sec
>> Check reaper interval: 10 sec
>>
>> Inter-check delay method: SMART
>> Average check interval: 319.130 sec
>> Inter-check delay: 1.156 sec
>>
>> Interleave factor method: SMART
>> Average services per host: 2.400
>> Service interleave factor: 3
>>
>> Initial service check scheduling info:
>> --------------------------------------
>> First scheduled check: 1091047230 -> Wed Jul 28 13:40:30
>> 2004 Last scheduled check: 1091047548 -> Wed Jul 28
>> 13:45:48 2004
>>
>> Rough guidelines for max_concurrent_checks value:
>> -------------------------------------------------
>> Absolute minimum value: 9
>> Recommend value: 27
>>
>> So what are the changes I need to make?
>>
>> Thanks :)
>>
>> Cook, Garry writes:
>>
>>> nagios-users-admin at lists.sourceforge.net wrote:
>>>> Hey guys,
>>>>
>>>> Here is the output of our production nagios' performance info. Is
>>>> my box ok?
>>>>
>>>> Program-Wide Performance Information
>>>> Active Checks: Time Frame Checks Completed
>>>> <= 1 minute: 71 (25.7%)
>>>> <= 5 minutes: 220 (79.7%)
>>>> <= 15 minutes: 262 (94.9%)
>>>> <= 1 hour: 276 (100.0%)
>>>> Since program start: 276 (100.0%)
>>>>
>>>> Metric Min. Max. Average
>>>> Check Execution Time: < 1 sec 16 sec 2.192 sec
>>>> Check Latency: 70 sec 130 sec 97.312 sec
>>>> Percent State Change: 0.00% 8.95% 0.17%
>>>
>>> It doesn't appear to be in 'bad' shape, although your checks appear
>>> to
>>
>>> be running a little behind.
>>> Run /<path to nagios>/bin/nagios -s /<path to
>>> nagios>/etc/nagios.cfg to see what Nagios thinks about your check
>>> latency, and this will also suggest some config changes for
>>> improvement.
>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Neil
>>>
>>>
>>> Garry W. Cook, CCNA
>>> Network Infrastructure Manager
>>> MACTEC, Inc. - http://www.mactec.com/
>>> 303.308.6228 (Office) - 720.220.1862 (Mobile)
-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list