Nagios to monitor job progress?
Frost, Mark {PBG}
mark.frost1 at pepsi.com
Fri Nov 13 15:00:19 CET 2009
We do all this sort of log file monitoring to Nagios alerting using Splunk and get very rapid alerting using this mechanism. Splunk runs one or more scheduled searches looking for certain patterns with a small window going backwards (i.e. "in the last 2 minutes, did you see X"). That will then trigger a script that returns passive check results to Nagios. It's worked quite well for us.
Mark
________________________________________
From: Jim Avery [jim at jimavery.me.uk]
Sent: Friday, November 13, 2009 8:54 AM
To: Todd Mcneill
Cc: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Nagios to monitor job progress?
2009/11/9 Todd Mcneill <todd.mcneill at pmigroup.com>:
> Hi-
>
> I have a batch process that runs on a Windows box that I’d like to monitor
> through Nagios. Currently, the job writes all status output to a log file,
> and this log file is parsed by another monitoring agent (NetIQ) in order to
> alert when problems occur. This job only runs once a month, and is not
> scheduled on the Windows server, but triggered by another process running on
> an iSeries box, so we have no idea when the job will start. In order to
> gain awareness of job progress, NetIQ is also sending non-error alerts when
> the job starts and completes, as well as alerts when certain phases of the
> job complete, also based upon the job log output.
>
> NetIQ is completely unreliable, and we’re getting these alerts hours after
> the events occur. I currently have Nagios running in our enterprise
> monitoring Unix and database services, but nothing running for Windows. I
> can probably use Splunk to perform the log parsing and alerting, but that
> means adding another alert source into the mix, which can get confusing.
>
> Does anyone have any suggestions on how I can do this through Nagios? Is
> there a way to configure a Nagios service check to send out an alert for a
> non-error condition (i.e. no state change, just a status update change)?
You can install the send_nsca binary for Windows on your server, then
get your job to run the binary, sending the appropriate passive check
result to the Nagios server whenever a particular part of the job
completes. It can send OK check results for steps which finish okay
and WARNING or CRITICAL check results if any part fails.
You will of course need to configure the nsca daemon on your Nagios
server to receive the checks.
You can configure the passive service with freshness checking so that
if you don't recieve any check at all for a given amount of time,
Nagios will alert you.
hth,
Jim
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list