Multiple Nagios proccesses running.
Andreas Ericsson
ae at op5.se
Thu Aug 11 15:24:46 CEST 2005
Chris Wilson wrote:
> Hi Andreas,
>
>
>>Rather than exiting if it finds a process running with the same pid, it
>>should try and kill it (using SIGTERM, sleeping 5 seconds and then
>>issuing SIGKILL). This is because we can't be sure WHAT process it
>>found, just that it has the same pid as the one that used to be nagios,
>>and on a restart attempt where the previous daemon failed to exit the
>>logical thing to do is to re-read the configuration.
>
>
> You're right that we can't identify whether the other process is, but
> killing it sounds much worse than just aborting! What if the user is
> running several daemons as the same UID (e.g. nobody, daemon) and
> another one gets the PID that Nagios was using before?
>
True. For a proper fix, the lockfile would be locked against writing by
the old process. If there is no such process *AND* the file isn't
locked, it's fairly safe to assume the process isn't another nagios
daemon. If the lock is held, but the pid is wrong, some process is
running but has failed to update the pid in the file (a bug, by its own
means), and if a process exists but no lock is held, it's safe to assume
that the process running is another nagios daemon. However, that leaves
us with the old checking system pretty much in place, and your patch
becoming something of an extra clarification. filelock held = nagios
running, no filelock = nagios possibly not running, or running with some
weird permissions, or some such.
However, in this scenario the filelock should always be attempted as
root (or at least as the most privileged user nagios starts as), because
root can sometimes (always, but sometimes silently) override filelocks
held by processes with lesser privileges.
> Surely it's safer to abort so that the user finds out something is
> wrong, checks for and removes the old Nagios process, and then deletes
> the lockfile?
This assumes user intervention, which I assumed was what you were trying
to move away from.
> It's at least better than the current behaviour (on Linux
> at least) of silently carrying on :-)
>
Indeed, but that behaviour is flawed on its own merit.
> But if you insist that killing the other process is the right thing to
> do, I will implement it.
>
I don't. It only is if Nagios is running as a dedicated pseuod-user,
which it won't necessarily be. One could ofcourse in such cases submit a
RELOAD command to the external pipe. I'm not sure how many hoops one
should jump through though, or even if it's the right one to jump next.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Lead Developer
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list