service checks randomly disappear
aloclarit at aol.com
aloclarit at aol.com
Thu Aug 11 08:57:01 CEST 2005
well i'll be damned ! You were right!
Question is how the hell did that happen ? I never used anything but
the init script provided to restart nagios.
I had to kill -9 a whole bunch of those procs. the thing is I always
assumed it is totally normal for the nagios proc to spawn few
child procs if it gets busy but i guess that's really a problem. I
tried /etc/init.d/nagios stop and then ps -ef | grep nagios.
no proc shows up now and even if I do a restart which i've been doing
it seems to kill all procs properly so how did I end up with 5
nagios procs in the first place...hmm.
Well al least I know what to look for now..thanks but I don't like it.
A killall -9 nagios should do the trick though or pkill -U
nagios...time to modify the init script.
btw. I still don't get why only 2 services were affected out of over
100.
Alex
-----Original Message-----
From: Subhendu Ghosh <sghosh at sghosh.org>
To: aloclarit at aol.com
Cc: nagios-users at lists.sourceforge.net
Sent: Wed, 10 Aug 2005 22:07:31 -0400 (EDT)
Subject: Re: [Nagios-users] service checks randomly disappear
You probably have multiple daemons updating the same file.
Only run one nagios daemon at a time...
On Wed, 10 Aug 2005 aloclarit at aol.com wrote:
> did some more digging and indeed it apeears the status.dat file does
not get > updated most of the time.
> Whenever i get the error below I check status.dat and there is no
entry for > the service. If i keep refreshing till it shows up and then
check status.dat > the service appears in there. Makes sense i guess
cause the webinterface > relies on status.dat to display the stuff but
why is status.dat not properly > updated ?
> btw. just tried latest CVS - same prob.
>
> -----Original Message-----
> From: aloclarit at aol.com
> To: aloclarit at aol.com; nagios-users at lists.sourceforge.net
> Sent: Wed, 10 Aug 2005 20:27:06 -0400
> Subject: Re: [Nagios-users] service checks randomly disappear
>
> forgot to mention that whenever the service DOES show up and I click
on it > I usually get this error :
>
> Error: Service Status Not Found!
>
> however, if I keep refreshing the proper page will show up and then
change > back to the error again. Looks like it keeps loosing
> the service status info for these 2 services.
>
> -----Original Message-----
> From: aloclarit at aol.com
> To: nagios-users at lists.sourceforge.net
> Sent: Wed, 10 Aug 2005 20:11:57 -0400
> Subject: [Nagios-users] service checks randomly disappear
>
> guys
>
> I have a really weird problem. I've been running nagios 2.0b3 for
months > without hickups (except that it never remembers
> host acknowledgements/disabled host notifications after a
restart..and yes I > have that set in nagios.cfg!)
>
> But here is the big problem :
> I added 2 services for 2 databases and for some reason nagios has
trouble > being consistent with those 2 services.
> After I first added them the webinterface didn't show them.I had to
refresh > the page a few times then it showed up - only to disappear
again a second > later. It keeps doing that. One moment it have 15
services and after a page > refresh it shows 17 services.
> Sometimes it shows 16 then 17 or 15 again. This is really driving me
nuts.
> I upgraded to 2.04b today but no luck. the funny thing is that this
ONLY > happens to those 2 services but I cannot see how they
> are different. I'm also checking these 2 services on another box -
no > problem at all. Then again, all other services on those2 boxes run
fine too.
> I tried defining them seperately by box as well as in a hostgroup -
no > difference.
> Here are the 2 services:
>
> define service{
> use generic-service
> hostgroup_name sdatabases
> service_description CONCURRENT TRANSACTIONS IN WAIT QUEUE
> check_command
check_oracle_queue!--service-waits!bspstg!nagios!nagios!20!30
> max_check_attempts 5
> normal_check_interval 5
> contact_groups admins,managers
> }
>
> define service{
> use generic-service
> hostgroup_name sdatabases
> service_description CONCURRENT TRANSACTIONS
> check_command >
check_oracle_requests!--requests!bspstg!nagios!nagios!1!700!1000
> max_check_attempts 5
> normal_check_interval 5
> contact_groups admins,managers
> }
>
> As I said these 2 are also defined on a devbox so instead of
hostgroup_name > it says host_name and instead of 'bspstg' it says
'bspdev' but none of that > should matter. Here the services are
totally consistent whereas in the > sdatabases group they keep dis- and
re-appearing.
>
> Anyone seen this ??
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle
Practices
> Agile & Plan-Driven Development * Managing Projects & Teams *
Testing & QA
> Security * Process Improvement & Measurement *
http://www.sqe.com/bsce5sf
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting > any issue. ::: Messages without supporting info will risk
being sent to > /dev/null
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle
Practices
> Agile & Plan-Driven Development * Managing Projects & Teams *
Testing & QA
> Security * Process Improvement & Measurement *
http://www.sqe.com/bsce5sf
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting > any issue. ::: Messages without supporting info will risk
being sent to > /dev/null
>
> >
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle
Practices
> Agile & Plan-Driven Development * Managing Projects & Teams *
Testing & QA
> Security * Process Improvement & Measurement *
http://www.sqe.com/bsce5sf
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting > any issue. ::: Messages without supporting info will risk
being sent to > /dev/null
>
--
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle
Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing
& QA
Security * Process Improvement & Measurement *
http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. ::: Messages without supporting info will risk
being sent to /dev/null
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list