service checks randomly disappear

aloclarit at aol.com aloclarit at aol.com
Thu Aug 11 08:57:01 CEST 2005


well i'll be damned ! You were right!
Question is how the hell did that happen ? I never used anything but 
the init script provided to restart nagios.
I had to kill -9 a whole bunch of those procs. the thing is I always 
assumed it is totally normal for the nagios proc to spawn few
child procs if it gets busy but i guess that's really a problem. I 
tried /etc/init.d/nagios stop and then ps -ef | grep nagios.
no proc shows up now and even if I do a restart which i've been doing 
it seems to kill all procs properly so how did I end up with 5
nagios procs in the first place...hmm.
Well al least I know what to look for now..thanks but I don't like it. 
A killall -9 nagios should do the trick though or pkill -U 
nagios...time to modify the init script.

btw. I still don't get why only 2 services were affected out of over 
100.


Alex

-----Original Message-----
From: Subhendu Ghosh <sghosh at sghosh.org>
To: aloclarit at aol.com
Cc: nagios-users at lists.sourceforge.net
Sent: Wed, 10 Aug 2005 22:07:31 -0400 (EDT)
Subject: Re: [Nagios-users] service checks randomly disappear


 You probably have multiple daemons updating the same file.

 Only run one nagios daemon at a time...

 On Wed, 10 Aug 2005 aloclarit at aol.com wrote:

  > did some more digging and indeed it apeears the status.dat file does 
not get > updated most of the time.
  > Whenever i get the error below I check status.dat and there is no 
entry for > the service. If i keep refreshing till it shows up and then 
check status.dat > the service appears in there. Makes sense i guess 
cause the webinterface > relies on status.dat to display the stuff but 
why is status.dat not properly > updated ?
 > btw. just tried latest CVS - same prob.
 >
 > -----Original Message-----
 > From: aloclarit at aol.com
 > To: aloclarit at aol.com; nagios-users at lists.sourceforge.net
 > Sent: Wed, 10 Aug 2005 20:27:06 -0400
 > Subject: Re: [Nagios-users] service checks randomly disappear
 >
  > forgot to mention that whenever the service DOES show up and I click 
on it > I usually get this error :
 >
 > Error: Service Status Not Found!
 >
  > however, if I keep refreshing the proper page will show up and then 
change > back to the error again. Looks like it keeps loosing
 > the service status info for these 2 services.
 >
 > -----Original Message-----
 > From: aloclarit at aol.com
 > To: nagios-users at lists.sourceforge.net
 > Sent: Wed, 10 Aug 2005 20:11:57 -0400
 > Subject: [Nagios-users] service checks randomly disappear
 >
 > guys
 >
  > I have a really weird problem. I've been running nagios 2.0b3 for 
months > without hickups (except that it never remembers
  > host acknowledgements/disabled host notifications after a 
restart..and yes I > have that set in nagios.cfg!)
 >
 > But here is the big problem :
  > I added 2 services for 2 databases and for some reason nagios has 
trouble > being consistent with those 2 services.
  > After I first added them the webinterface didn't show them.I had to 
refresh > the page a few times then it showed up - only to disappear 
again a second > later. It keeps doing that. One moment it have 15 
services and after a page > refresh it shows 17 services.
  > Sometimes it shows 16 then 17 or 15 again. This is really driving me 
nuts.
  > I upgraded to 2.04b today but no luck. the funny thing is that this 
ONLY > happens to those 2 services but I cannot see how they
  > are different. I'm also checking these 2 services on another box - 
no > problem at all. Then again, all other services on those2 boxes run 
fine too.
  > I tried defining them seperately by box as well as in a hostgroup - 
no > difference.
 > Here are the 2 services:
 >
 > define service{
 > use generic-service
 > hostgroup_name sdatabases
 > service_description CONCURRENT TRANSACTIONS IN WAIT QUEUE
  > check_command 
check_oracle_queue!--service-waits!bspstg!nagios!nagios!20!30
 > max_check_attempts 5
 > normal_check_interval 5
 > contact_groups admins,managers
 > }
 >
 > define service{
 > use generic-service
 > hostgroup_name sdatabases
 > service_description CONCURRENT TRANSACTIONS
  > check_command > 
check_oracle_requests!--requests!bspstg!nagios!nagios!1!700!1000
 > max_check_attempts 5
 > normal_check_interval 5
 > contact_groups admins,managers
 > }
 >
  > As I said these 2 are also defined on a devbox so instead of 
hostgroup_name > it says host_name and instead of 'bspstg' it says 
'bspdev' but none of that > should matter. Here the services are 
totally consistent whereas in the > sdatabases group they keep dis- and 
re-appearing.
 >
 > Anyone seen this ??
 >
 > -------------------------------------------------------
 > SF.Net email is Sponsored by the Better Software Conference & EXPO
  > September 19-22, 2005 * San Francisco, CA * Development Lifecycle 
Practices
  > Agile & Plan-Driven Development * Managing Projects & Teams * 
Testing & QA
  > Security * Process Improvement & Measurement * 
http://www.sqe.com/bsce5sf
 > _______________________________________________
 > Nagios-users mailing list
 > Nagios-users at lists.sourceforge.net
 > https://lists.sourceforge.net/lists/listinfo/nagios-users
  > ::: Please include Nagios version, plugin version (-v) and OS when 
reporting > any issue. ::: Messages without supporting info will risk 
being sent to > /dev/null
 >
 >
 > -------------------------------------------------------
 > SF.Net email is Sponsored by the Better Software Conference & EXPO
  > September 19-22, 2005 * San Francisco, CA * Development Lifecycle 
Practices
  > Agile & Plan-Driven Development * Managing Projects & Teams * 
Testing & QA
  > Security * Process Improvement & Measurement * 
http://www.sqe.com/bsce5sf
 > _______________________________________________
 > Nagios-users mailing list
 > Nagios-users at lists.sourceforge.net
 > https://lists.sourceforge.net/lists/listinfo/nagios-users
  > ::: Please include Nagios version, plugin version (-v) and OS when 
reporting > any issue. ::: Messages without supporting info will risk 
being sent to > /dev/null
 >
 > >
 > -------------------------------------------------------
 > SF.Net email is Sponsored by the Better Software Conference & EXPO
  > September 19-22, 2005 * San Francisco, CA * Development Lifecycle 
Practices
  > Agile & Plan-Driven Development * Managing Projects & Teams * 
Testing & QA
  > Security * Process Improvement & Measurement * 
http://www.sqe.com/bsce5sf
 > _______________________________________________
 > Nagios-users mailing list
 > Nagios-users at lists.sourceforge.net
 > https://lists.sourceforge.net/lists/listinfo/nagios-users
  > ::: Please include Nagios version, plugin version (-v) and OS when 
reporting > any issue. ::: Messages without supporting info will risk 
being sent to > /dev/null
 >

 --

 -------------------------------------------------------
 SF.Net email is Sponsored by the Better Software Conference & EXPO
  September 19-22, 2005 * San Francisco, CA * Development Lifecycle 
Practices
  Agile & Plan-Driven Development * Managing Projects & Teams * Testing 
& QA
  Security * Process Improvement & Measurement * 
http://www.sqe.com/bsce5sf
 _______________________________________________
 Nagios-users mailing list
 Nagios-users at lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
  ::: Please include Nagios version, plugin version (-v) and OS when 
reporting any issue. ::: Messages without supporting info will risk 
being sent to /dev/null

   


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list