critical soft state every 3 hours
Marki
jm+nagios-users at roth.lu
Fri Apr 20 16:31:47 CEST 2012
Hi,
we have a problem where all the services checked around 00:01, 03:01, 06:01,
..., i.e. every three hours one minute after the hour, return a critical soft
state. Most of the times they go back to normal, however sometimes they also end
up in a hard state. You can imagine the rest...
We are running Nagios in a virtualized environment (vmware), on a SLES10 VM with
3GB of RAM and 4 vCPUs. The average load of the machine is about 5.
We did not succeed in reproducing network trouble when doing basic checks around
those times from and to other hosts. Indeed the VM running nagios experiences
packet loss somehow. Even when run on completely different Vmware hosts:
Tue Apr 17 21:02:01 CEST 2012
5000 packets transmitted, 4990 received, 0% packet loss, time 3840ms
–
5000 packets transmitted, 4998 received, 0% packet loss, time 2979ms
5000 packets transmitted, 4994 received, 0% packet loss, time 6190ms
–
Wed Apr 18 09:02:01 CEST 2012
5000 packets transmitted, 4999 received, 0% packet loss, time 5230ms
–
5000 packets transmitted, 4999 received, 0% packet loss, time 3340ms
–
5000 packets transmitted, 4979 received, 0% packet loss, time 11298ms
–
Wed Apr 18 12:02:01 CEST 2012
5000 packets transmitted, 4978 received, 0% packet loss, time 12764ms
–
Wed Apr 18 15:01:01 CEST 2012
5000 packets transmitted, 4987 received, 0% packet loss, time 4037ms
–
Wed Apr 18 15:02:01 CEST 2012
5000 packets transmitted, 4987 received, 0% packet loss, time 9010ms
Do you think this is related to Nagios? What could that be?
Here are some Nagios metrics:
Services Actively Checked:
<= 1 minute: 0 (0.0%)
<= 5 minutes: 2096 (78.3%)
<= 15 minutes: 2626 (98.1%)
<= 1 hour: 2665 (99.5%)
Since program start: 2666 (99.6%)
Metric Min. Max. Average
Check Execution Time: 0.00 sec 52.15 sec 1.133 sec
Check Latency: 0.00 sec 3.03 sec 0.183 sec
Percent State Change: 0.00% 64.54% 1.16%
Check Stats:
Type Last 1 Min Last 5 Min Last 15 Min
Active Scheduled Host Checks 54 282 602
Active On-Demand Host Checks 25 123 405
Parallel Host Checks 56 290 614
Serial Host Checks 0 0 0
Cached Host Checks 23 115 387
Passive Host Checks 0 0 0
Active Scheduled Service Checks 987 4203 12647
Active On-Demand Service Checks 0 0 0
Cached Service Checks 0 0 0
Passive Service Checks 0 0 0
External Commands 0 0 0
Thanks
marki
------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list