Critical Plugin Timed Out
Andreas Ericsson
ae at op5.se
Fri Aug 31 13:12:26 CEST 2007
Patrick M. wrote:
> Hi all,
>
> I've been running Nagios 2.6 for about 6 months now, and every now and
> then we get critical pages about a machine being down, or at least
> Nagios can't connect to it. It causes the CEO to freak out and believe
> something is up with our network.
>
> To me, it seems like the box is getting stressed out during the tests
> and is causing the plugins to time out.
>
> Here's some of the alerts from this morning:
>
> #######################################
> [08-30-2007 09:24:10] HOST ALERT: tu.xyz.com;DOWN;SOFT;1;CRITICAL -
> Plugin timed out after 10 seconds
> Service Warning[08-30-2007 09:23:40] SERVICE ALERT:
> pule.xyz.com;PING;WARNING;SOFT;1;PING WARNING - Packet loss = 44%, RTA =
> 3.64 ms
> #######################################
>
Are you noticing any slowdown in normal network traffic while all this is
happening?
Most of the checks that have timed out are ICMP-based. Assuming you're
doing some wonky QoS-stuff (windows has that stuff built in...), it's
not too hard to guess that ICMP is probably right at the bottom of the
priority list.
>
> The machine is a p4 2.4 ghz with 1gb ram.
>
How many checks are you running / minute? It should be
capable of handling 500 - 800 / minute without any problems
at all.
> I'm not sure how to troubleshoot this - any ideas?
Check QoS settings in the network. If it's not that, try
removing half the checks and see if that solves it. If it does,
you've got either a really bad network or underdimensioned
hardware.
If it's more checks than ICMP-based ones that are acting up and
you primarily see lots of false alarms within a short (10-30 seconds)
window, make sure you haven't got your network card set to auto-
negotiate transfer speed and duplex.
I assume you haven't set the nagios server to obtain a dhcp-address,
as renewing such a one can sometimes have funny impact on montoring,
but while you're at it, make sure (by triple-checking) that there's
only one machine with the IP of the monitoring machine.
> What can I provide
> you folks in order to help me out?
>
Money, or evidence of having tried things on your own. Both are
hard currency when asking for help in a tech-savvy forum.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list