Service check goes HARD too quick if multiple service are in problem state

FTL Nagios ftlnagios at gmail.com
Wed Jan 16 10:34:45 CET 2013


My objects.cache file seems to contain all of the right information.


Output from ps -ef | grep nagios reads

nagios at SERVER:~$ ps -ef | grep nagios
nagios     984     1  0 Jan09 ?        00:12:54 /usr/local/nagios/bin/nagios
-d /usr/local/nagios/etc/nagios.cfg
nagios    1889     1  0 Jan09 ?        00:00:00
/usr/bin/gnome-keyring-daemon --daemonize --login
nagios    1900  1654  0 Jan09 ?        00:00:00 gnome-session
--session=gnome-classic
nagios    1941  1900  0 Jan09 ?        00:00:00 /usr/bin/ssh-agent
/usr/bin/dbus-launch --exit-with-session gnome-session
--session=gnome-classic
nagios    1944     1  0 Jan09 ?        00:00:00 /usr/bin/dbus-launch
--exit-with-session gnome-session --session=gnome-classic
nagios    1945     1  0 Jan09 ?        00:00:06 //bin/dbus-daemon --fork
--print-pid 5 --print-address 7 --session
nagios    1961  1900  0 Jan09 ?        00:00:15
/usr/lib/gnome-settings-daemon/gnome-settings-daemon
nagios    1970     1  0 Jan09 ?        00:00:00 /usr/lib/gvfs/gvfsd
nagios    1972     1  0 Jan09 ?        00:00:00
/usr/lib/gvfs//gvfs-fuse-daemon -f /home/nagios/.gvfs
nagios    1981     1  0 Jan09 ?        00:00:00
/usr/lib/gnome-settings-daemon/gsd-printer
nagios    1985  1900  0 Jan09 ?        00:00:03 metacity
nagios    1999     1  0 Jan09 ?        00:00:00
/usr/lib/i386-linux-gnu/gconf/gconfd-2
nagios    2001  1900  0 Jan09 ?        00:00:18 gnome-panel
nagios    2005     1  0 Jan09 ?        00:00:00 /usr/lib/dconf/dconf-service
nagios    2010  1900  0 Jan09 ?        00:00:00 bluetooth-applet
nagios    2011  1900  0 Jan09 ?        00:00:00
/usr/lib/policykit-1-gnome/polkit-gnome-authentication-agent-1
nagios    2012  1900  0 Jan09 ?        00:00:10 nautilus -n
nagios    2017  1900  0 Jan09 ?        00:00:00
/usr/lib/gnome-settings-daemon/gnome-fallback-mount-helper
nagios    2019  1900  0 Jan09 ?        00:00:00 nm-applet
nagios    2025     1  0 Jan09 ?        00:00:00
/usr/lib/gvfs/gvfs-gdu-volume-monitor
nagios    2038     1  0 Jan09 ?        00:00:00
/usr/lib/gnome-applets/trashapplet
nagios    2042     1  0 Jan09 ?        00:00:09
/usr/lib/gvfs/gvfs-afc-volume-monitor
nagios    2045     1  0 Jan09 ?        00:00:00
/usr/lib/gvfs/gvfs-gphoto2-volume-monitor
nagios    2053     1  0 Jan09 ?        00:00:26
/usr/lib/indicator-applet/indicator-applet-complete
nagios    2058     1  0 Jan09 ?        00:00:00 /usr/lib/gvfs/gvfsd-trash
--spawner :1.7 /org/gtk/gvfs/exec_spaw/0
nagios    2085     1  0 Jan09 ?        00:00:00
/usr/lib/indicator-session/indicator-session-service
nagios    2087     1  0 Jan09 ?        00:00:00
/usr/lib/indicator-application/indicator-application-service
nagios    2092     1  0 Jan09 ?        00:00:00
/usr/lib/indicator-printers/indicator-printers-service
nagios    2095     1  0 Jan09 ?        00:00:00
/usr/lib/indicator-sound/indicator-sound-service
nagios    2097     1  0 Jan09 ?        00:00:00
/usr/lib/indicator-datetime/indicator-datetime-service
nagios    2100     1  0 Jan09 ?        00:00:00
/usr/lib/indicator-messages/indicator-messages-service
nagios    2119     1  0 Jan09 ?        00:00:00
/usr/lib/geoclue/geoclue-master
nagios    2124     1  0 Jan09 ?        00:00:00
/usr/lib/ubuntu-geoip/ubuntu-geoip-provider
nagios    2139     1  0 Jan09 ?        00:00:00 /usr/lib/gvfs/gvfsd-burn
--spawner :1.7 /org/gtk/gvfs/exec_spaw/1
nagios    2143     1  0 Jan09 ?        00:00:00 /usr/lib/gvfs/gvfsd-metadata
nagios    2173  1900  0 Jan09 ?        00:00:00
/usr/lib/gnome-disk-utility/gdu-notification-daemon
nagios    2239  1900  0 Jan09 ?        00:00:00 telepathy-indicator
nagios    2245     1  0 Jan09 ?        00:00:00
/usr/lib/telepathy/mission-control-5
nagios    2250     1  0 Jan09 ?        00:00:00
/usr/lib/gnome-online-accounts/goa-daemon
nagios    2265  1900  0 Jan09 ?        00:00:15 gnome-screensaver
nagios    2266  1900  0 Jan09 ?        00:00:05 zeitgeist-datahub
nagios    2274     1  0 Jan09 ?        00:00:00 /usr/bin/zeitgeist-daemon
nagios    2280     1  0 Jan09 ?        00:00:00
/usr/lib/zeitgeist/zeitgeist-fts
nagios    2288  2280  0 Jan09 ?        00:00:00 /bin/cat
nagios    2294  2293  0 Jan09 ?        00:00:00 /bin/sh -c
/usr/local/nagios/libexec/check_SMS_daemon.sh >/dev/null 2>&1 >/dev/null
2>&1 # JOB_ID_2
nagios    2295  2294  0 Jan09 ?        00:00:00 /bin/sh
/usr/local/nagios/libexec/check_SMS_daemon.sh
nagios    2297  2295  1 Jan09 ?        02:13:35 /usr/bin/gammu-smsd
nagios    2460  1900  0 Jan09 ?        00:00:05 update-notifier
nagios    2702     1  0 Jan09 ?        00:00:00
/usr/lib/at-spi2-core/at-spi-bus-launcher
nagios    2762  1900  0 Jan09 ?        00:00:00
/usr/lib/deja-dup/deja-dup/deja-dup-monitor
nagios    4005     1  0 Jan09 ?        00:00:02 gnome-terminal
nagios    4010  4005  0 Jan09 ?        00:00:00 gnome-pty-helper
nagios   14027  4005  0 Jan15 pts/3    00:00:00 bash
nagios   18084     1  0 09:24 ?        00:00:00 /usr/bin/pulseaudio --start
--log-target=syslog
nagios   18085 18084  0 09:24 ?        00:00:00
/usr/lib/pulseaudio/pulse/gconf-helper
nagios   18659 14027  0 09:26 pts/3    00:00:00 ps -ef
nagios   18660 14027  0 09:26 pts/3    00:00:00 grep --color=auto nagios
nagios   24025     1  0 07:57 ?        00:00:04 /usr/bin/python
/usr/bin/update-manager --no-focus-on-map
nagios   29231     1  0 Jan15 ?        00:00:00
/usr/lib/notify-osd/notify-osd


What will deleting those files you mentioned do?

Thanks in advance



-----Original Message-----
From: Justin T Pryzby [mailto:justinp at norchemlab.com] 
Sent: 15 January 2013 18:02
To: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Service check goes HARD too quick if multiple
service are in problem state

You could check that the check intervals show up right in objects.cache.

You could also try stopping nagios (check with ps that you don't have
multiple daemons running), removing the generated files and restarting (note
that this will cause notifications to be sent from scratch; you may want to
disable them first).

/var/cache/nagios3/
objects.cache  status.dat

/var/lib/nagios3/
retention.dat

On Tue, Jan 15, 2013 at 05:51:35PM +0000, Andrew Thompson wrote:
> Hi,
> 
> I have had this problem previously and posted here but not go nowhere with
it.
> 
> Ill have another bash.....
> 
> Basically my nagios machine is checking too frequently and firing out 
> alerts too quickly
> 
> Its ignoring the retry_interval value, the max_check_attempts value and
ingoring the notification_interval  value in the escalations.
> 
> I have check interval of 5 minutes in OK state Retry interval of 3 
> minutes when in problem state Notification interval of 3 minutes
> 
> I believe that below is the problem and multiple service checks in problem
state at the same time is casuing this.
> 
> 
> Ive just seen this on 1 of my hosts:
> 
> It appears its accumulating the service checks (even though they are
different checks) into a final HARD state.
> 
> Prior to 17:18 all was fine on this host!!!
> 
> 
> Then at 17:18 a SQL check went to warning state and to SOFT 1
> 
> Checked again at 17:21 which is the 3 minute interval I have told it 
> too when in problem and its still warning so onto SOFT2
> 
> Then a different service check on that host goes critical - but for 
> the first time
> 
> 17:22 memory usage and it puts this to HARD 3 - even though this 
> actual check for memory should be SOFT1
> 
> An alert then got sent straight out for the Memory check even though 
> it was actually only check 1/3 on that particular service
> 
> Here is the copy and past from the History of the host
> 
> [01-15-2013 17:18:24]
> SERVICE ALERT: SERVER;SQL LOCK TIMEOUTS;WARNING;SOFT;1;WARNING - 
> 2.3067 lock timeouts / sec for _Total, 2.0667 lock timeouts / sec for 
> Key, 0.0000 lock timeouts / sec for RID, 0.2400 lock timeouts / sec 
> for Page, 0.0000 lock timeouts / sec for Object, 0.0000 lock timeouts 
> / sec for Metadata, 0.0000 lock timeouts / sec for HoBT, 0.0000 lock 
> timeouts / sec for File, 0.0000 lock timeouts / sec for Extent, 0.0000 
> lock timeouts / sec for Database, 0.0000 lock timeouts / sec for 
> Application, 0.0000 lock timeouts / sec for AllocUnit
> [01-15-2013 17:21:24]
> SERVICE ALERT: SERVER;SQL LOCK TIMEOUTS;WARNING;SOFT;2;WARNING - 
> 1.3056 lock timeouts / sec for _Total, 1.1833 lock timeouts / sec for 
> Key, 0.0000 lock timeouts / sec for RID, 0.1222 lock timeouts / sec 
> for Page, 0.0000 lock timeouts / sec for Object, 0.0000 lock timeouts 
> / sec for Metadata, 0.0000 lock timeouts / sec for HoBT, 0.0000 lock 
> timeouts / sec for File, 0.0000 lock timeouts / sec for Extent, 0.0000 
> lock timeouts / sec for Database, 0.0000 lock timeouts / sec for 
> Application, 0.0000 lock timeouts / sec for AllocUnit
> 
> [01-15-2013 17:22:04]
> SERVICE ALERT: SERVER;MEMORY USAGE;CRITICAL;HARD;3;CRITICAL: physical 
> memory: Total: 10G - Used: 9.81G (98%) - Free: 192M (2%) > critical
> 
> 
> 
> Does anybody please have any idea why my server is checking too frequently
and alerting too frequently and why its totting up different service checks?
> 
> This machine has done nothing but not work right since it was loaded a
couple months ago.
> Im using the come config files on it as I did on the previous box I had -
only difference was that was running 3.3.1 - I had none of these problems on
that install.
> 
> 
> This is a Nagios 3.4.1 install on a Ubuntu 12.04 desktop 32 bit OS

----------------------------------------------------------------------------
--
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS and
more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


------------------------------------------------------------------------------
Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery
and much more. Keep your Java skills current with LearnJavaNow -
200+ hours of step-by-step video tutorials by Java experts.
SALE $49.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122612 
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list