critical soft state every 3 hours
Eduardo Silvestre
eduardosilvestre at me.com
Sat Apr 21 16:28:33 CEST 2012
Did you have any task/cron running every 30 minutes?
What is the io wait of that vm?
On 20/04/2012, at 15:31, Marki <jm+nagios-users at roth.lu> wrote:
> Hi,
>
> we have a problem where all the services checked around 00:01, 03:01, 06:01,
> ..., i.e. every three hours one minute after the hour, return a critical soft
> state. Most of the times they go back to normal, however sometimes they also end
> up in a hard state. You can imagine the rest...
>
> We are running Nagios in a virtualized environment (vmware), on a SLES10 VM with
> 3GB of RAM and 4 vCPUs. The average load of the machine is about 5.
>
> We did not succeed in reproducing network trouble when doing basic checks around
> those times from and to other hosts. Indeed the VM running nagios experiences
> packet loss somehow. Even when run on completely different Vmware hosts:
>
> Tue Apr 17 21:02:01 CEST 2012
> 5000 packets transmitted, 4990 received, 0% packet loss, time 3840ms
> –
> 5000 packets transmitted, 4998 received, 0% packet loss, time 2979ms
> 5000 packets transmitted, 4994 received, 0% packet loss, time 6190ms
> –
> Wed Apr 18 09:02:01 CEST 2012
> 5000 packets transmitted, 4999 received, 0% packet loss, time 5230ms
> –
> 5000 packets transmitted, 4999 received, 0% packet loss, time 3340ms
> –
> 5000 packets transmitted, 4979 received, 0% packet loss, time 11298ms
> –
> Wed Apr 18 12:02:01 CEST 2012
> 5000 packets transmitted, 4978 received, 0% packet loss, time 12764ms
> –
> Wed Apr 18 15:01:01 CEST 2012
> 5000 packets transmitted, 4987 received, 0% packet loss, time 4037ms
> –
> Wed Apr 18 15:02:01 CEST 2012
> 5000 packets transmitted, 4987 received, 0% packet loss, time 9010ms
>
> Do you think this is related to Nagios? What could that be?
>
> Here are some Nagios metrics:
>
> Services Actively Checked:
> <= 1 minute: 0 (0.0%)
> <= 5 minutes: 2096 (78.3%)
> <= 15 minutes: 2626 (98.1%)
> <= 1 hour: 2665 (99.5%)
> Since program start: 2666 (99.6%)
>
> Metric Min. Max. Average
> Check Execution Time: 0.00 sec 52.15 sec 1.133 sec
> Check Latency: 0.00 sec 3.03 sec 0.183 sec
> Percent State Change: 0.00% 64.54% 1.16%
>
> Check Stats:
> Type Last 1 Min Last 5 Min Last 15 Min
> Active Scheduled Host Checks 54 282 602
> Active On-Demand Host Checks 25 123 405
> Parallel Host Checks 56 290 614
> Serial Host Checks 0 0 0
> Cached Host Checks 23 115 387
> Passive Host Checks 0 0 0
> Active Scheduled Service Checks 987 4203 12647
> Active On-Demand Service Checks 0 0 0
> Cached Service Checks 0 0 0
> Passive Service Checks 0 0 0
> External Commands 0 0 0
>
>
>
> Thanks
>
> marki
>
>
> ------------------------------------------------------------------------------
> For Developers, A Lot Can Happen In A Second.
> Boundary is the first to Know...and Tell You.
> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
> http://p.sf.net/sfu/Boundary-d2dvs2
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list