check_nagios isn't very smart.
jeff vier
jeff.vier at tradingtechnologies.com
Tue Sep 30 19:59:11 CEST 2003
that ignores the possibility of zombies. Same as my first point from
the original message.
If check_procs had an arg that would look for the pid (not the ppid),
then it would be useful because I could cat in the nagios.lock. But it
doesn't.
On Tue, 2003-09-30 at 12:41, Williams, P. Lane wrote:
> use one of the check_procs plugins.
>
> -----Original Message-----
> From: jeff vier [mailto:jeff.vier at tradingtechnologies.com]
> Sent: Tuesday, September 30, 2003 12:51 PM
> To: nagios-users
> Subject: [Nagios-users] check_nagios isn't very smart.
>
>
> Okay, I tuned our nagios system, here.
>
> With an increase in efficiency and "intelligence" there's a lot less
> false alerts.
>
> However, that in itself is causing another problem.
>
> Since check_nagios depends on the log being updated to figure out if
> nagios is running, it often thinks it's dead. We can easily go an hour
> without an update to the log file.
>
> I fixed this by setting log_service_retries=1, but that seems
> ridiculous. Turning on what amounts to debugging to trick another
> element of nagios.
>
> So, my question is, is there another way to watch nagios that doesn't
> cause me to have to pile tons of garbage into my filesystem?
>
> Some things I was considering, and the reasons I haven't [yet?]:
>
> option 1 - cron once per 1 min (and have a 2 min nagios_check max):
> if [ "`ps -ef |grep nagios|grep -v grep|wc`" -gt 2 ]; then echo
> "[`date
> +%s`] Heartbeat">> nagios.log; fi
>
> problem - What about zombied processes? I'm falsely assuming 1 or
> more nagios processes means it's okay.
>
> option 2 - change the nagios_check_command in cgi.cfg to use a script
> with a bunch more logic, but basically use
> 'lynx -head -dump -auth=user:pwd \
> "http://localhost/nagios/cgi-bin/extinfo.cgi?type=1&host=hostname"'
>
> problem - I'm depending on http, which I guess is okay, since if http
> is failing, I'd be updating the nagios.log anyway with that error and
> sending out alerts. also, I have to re-invent the process with, so far,
> unknown feasibility, and I don't have much time to waste if it turns out
> this is a bad idea for reasons I didn't think of (hence my asking).
>
> Thoughts? If I do end up figuring out a new way to do it, I'll
> certainly post it.
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list