Scheduled checks falling far behind
Litwin, Matthew
mlitwin at stubhub.com
Sat Oct 23 19:16:14 CEST 2010
For the Total Services, what are the three X / X / X values mean? Is it last 1/5/15 min?
On Oct 23, 2010, at 9:48 AM, Litwin, Matthew wrote:
> Here are my stats... definitely have a problem if latencies are between 5-10 minutes!
>
> check_reaper_frequency was set at 10, which seems high. I am going to try 5 as used in the core nagios guide and see what that does.
>
> Nagios Stats 3.2.1
> Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
> Last Modified: 03-09-2010
> License: GPL
>
> CURRENT STATUS DATA
> ------------------------------------------------------
> Status File: /usr/local/nagios/var/status.dat
> Status File Age: 0d 0h 0m 29s
> Status File Version: 3.2.1
>
> Program Running Time: 0d 0h 4m 9s
> Nagios PID: 17295
> Used/High/Total Command Buffers: 0 / 0 / 4096
>
> Total Services: 4987
> Services Checked: 4987
> Services Scheduled: 4970
> Services Actively Checked: 4987
> Services Passively Checked: 0
> Total Service State Change: 0.000 / 16.970 / 0.007 %
> Active Service Latency: 0.034 / 526.244 / 351.201 sec
> Active Service Execution Time: 0.013 / 17.745 / 0.393 sec
> Active Service State Change: 0.000 / 16.970 / 0.007 %
> Active Services Last 1/5/15/60 min: 205 / 1353 / 3568 / 4970
> Passive Service Latency: 0.000 / 0.000 / 0.000 sec
> Passive Service State Change: 0.000 / 0.000 / 0.000 %
> Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
> Services Ok/Warn/Unk/Crit: 4969 / 11 / 1 / 6
> Services Flapping: 0
> Services In Downtime: 0
>
> Total Hosts: 241
> Hosts Checked: 241
> Hosts Scheduled: 241
> Hosts Actively Checked: 241
> Host Passively Checked: 0
> Total Host State Change: 0.000 / 0.000 / 0.000 %
> Active Host Latency: 0.000 / 487.501 / 216.928 sec
> Active Host Execution Time: 0.149 / 4.310 / 3.780 sec
> Active Host State Change: 0.000 / 0.000 / 0.000 %
> Active Hosts Last 1/5/15/60 min: 38 / 131 / 199 / 241
> Passive Host Latency: 0.000 / 0.000 / 0.000 sec
> Passive Host State Change: 0.000 / 0.000 / 0.000 %
> Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
> Hosts Up/Down/Unreach: 241 / 0 / 0
> Hosts Flapping: 0
> Hosts In Downtime: 0
>
> Active Host Checks Last 1/5/15 min: 49 / 135 / 135
> Scheduled: 48 / 131 / 131
> On-demand: 1 / 4 / 4
> Parallel: 48 / 131 / 131
> Serial: 0 / 0 / 0
> Cached: 1 / 4 / 4
> Passive Host Checks Last 1/5/15 min: 0 / 0 / 0
> Active Service Checks Last 1/5/15 min: 313 / 1353 / 1353
> Scheduled: 313 / 1353 / 1353
> On-demand: 0 / 0 / 0
> Cached: 0 / 0 / 0
> Passive Service Checks Last 1/5/15 min: 0 / 0 / 0
>
> External Commands Last 1/5/15 min: 0 / 0 / 0
>
> On Oct 22, 2010, at 6:53 PM, Frost, Mark {PBC} wrote:
>
>> Matthew,
>>
>> You don't say, but my guess would be that you have high latencies. That is for one of several reasons, Nagios is not able to run checks when it thinks it should. You can see this information and other stats by looking at the Performance item near the bottom of the Nav pane in the Nagios web interface.
>>
>> You can also run, if memory serves, the "nagiostats" command located in your Nagios "bin" directory to see this information as well. I actually use that nagiostats data in a custom check and graph a lot of those latencies and other Nagios performance related info.
>
>
>>
>>> From my own experience, I found that I did not pay attention to this information when I started using Nagios, then read about it, made a few tweaks to make it better then forgot about it. Then as our installation grew and grew, I found that some things got worse again and I had to consider different tuning options.
>>
>> I would recommend that you first read the "Tuning Nagios For Maximum Performance" section of the docs:
>>
>> http://nagios.sourceforge.net/docs/3_0/tuning.html
>>
>> If nothing else, this will give you an idea of some things that can affect latencies.
>>
>> Additionally, you may find that you see your average latencies, but then see something with a whopping huge max latency. It can be hard to track down what that is in the UI. I've just looked up that max latency and then quickly looked in the status.dat file to find the service that had that same matching latency and dug into that. You could, for example, have a few checks that aren't really timing out so the check may take 10 minutes or more to complete which would really screw up your overall latencies. Like the checks wouldn't have finished before the next time they were supposed to be run.
>>
>> Mark
>>
>> ________________________________________
>> From: Litwin, Matthew [mlitwin at stubhub.com]
>> Sent: Friday, October 22, 2010 8:29 PM
>> To: nagios-users at lists.sourceforge.net
>> Subject: [Nagios-users] Scheduled checks falling far behind
>>
>> I have been chasing my tail trying to figure out why my RRD files were very sparsely populated, and I am realizing that my checks are falling behind of their scheduled times up to 3 times their set check interval. For example a service that should be checking every 5 minutes. In the example below, the time is 00:19:02, the last check was 00:10:30 and the next scheduled check time is 00:13:28. This means it is almost 6 minutes behind schedule and almost 9 minutes since the last check!
>>
>> I find even if I shorten the check interval to say 3 minutes it still behaves about the same. The server has very low load and nagios is hardly working at all. (usually below 4% cpu) I haven't touch any of the tuning on this and from what I have read the default settings appear unthrottled. Is there any way to make it "work harder"?
>>
>> --Service information--
>> Last Updated: Sat Oct 23 00:19:02 UTC 2010
>>
>> --Service State Information--
>> Current Status:
>> OK
>> (for 7d 16h 14m 46s)
>> Status Information: CPU STATISTICS OK : user=0.12% system=0.00% iowait=0.00% idle=99.88%
>> Performance Data: 0.12;0.00;0.00;99.88;80;90
>> Current Attempt: 1/3 (HARD state)
>>>>> Last Check Time: 10-23-2010 00:10:30 <<<<
>> Check Type: ACTIVE
>> Check Latency / Duration: 612.633 / 2.052 seconds
>>>>> Next Scheduled Check: 10-23-2010 00:13:28 <<<
>> Last State Change: 10-15-2010 08:04:16
>> Last Notification: N/A (notification 0)
>> Is This Service Flapping?
>> NO
>> (0.00% state change)
>> In Scheduled Downtime?
>> NO
>> Last Update: 10-23-2010 00:18:33 ( 0d 0h 0m 29s ago)
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Nokia and AT&T present the 2010 Calling All Innovators-North America contest
>> Create new apps & games for the Nokia N8 for consumers in U.S. and Canada
>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
>> http://p.sf.net/sfu/nokia-dev2dev
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
>> ::: Messages without supporting info will risk being sent to /dev/null
>> ------------------------------------------------------------------------------
>> Nokia and AT&T present the 2010 Calling All Innovators-North America contest
>> Create new apps & games for the Nokia N8 for consumers in U.S. and Canada
>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
>> http://p.sf.net/sfu/nokia-dev2dev
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
>> ::: Messages without supporting info will risk being sent to /dev/null
>
>
> ------------------------------------------------------------------------------
> Nokia and AT&T present the 2010 Calling All Innovators-North America contest
> Create new apps & games for the Nokia N8 for consumers in U.S. and Canada
> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
> http://p.sf.net/sfu/nokia-dev2dev
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list