Overloaded master
Mike Lindsey
mike-nagios at 5dninja.net
Tue Jan 26 02:02:26 CET 2010
A typical first tier notification goes to 20 people. One of those will
be a pager, and is very simple.
The rest are fairly complex.
Notifications include a link to existing and recent tickets in our
ticketing system (this also allows me to not send a ticket opening
notification if a ticket already exists).. I populate the notification
with links to cacti graphs, links to wiki documentation for the event as
well as fire off a secondary notification handler that adds in
additional information based on the host, service, and state.
The first notification of the cycles does all the heavy lifting and
takes about 6 seconds. The other 19 finish relatively quickly.
I've been thinking of building a notification server - so I could have
separate and discrete notification escalations for different service
states - which would also let me fire off one notification with just the
contents of $ENV{NAGIOS_*}.. Perhaps that's my best option?
Martin Melin wrote:
> What kind of notifications are you doing and how many are you sending
> out? Why does a notification cycle take 9 seconds to complete?
>
> On Sat, Jan 23, 2010 at 12:13 AM, Mike Lindsey <mike-nagios at 5dninja.net
> <mailto:mike-nagios at 5dninja.net>> wrote:
>
> What kind of options does one have, if your master nagios server is
> getting overloaded?
>
> I have half a dozen slaves doing polling, submitting passive check
> results back via send_nsca. The master does no active polling, just
> event processing, notifications, and web ui.
>
> Under normal circumstances, it works alright. But after a restart it
> can take up to half an hour before the master catches up; and if there
> are a lot of events, the act of sending out notifications can cause it
> to fall behind.
>
> I'm pre-caching my object file, I'm skipping circular dependency checks,
> and I've gotten a notification cycle down to 9 seconds. I tried
> modifying nagios to fork before notifications, but that failed pretty
> spectacularly; so that 9 seconds is a time where 900 or so passive check
> submissions block until the notifications are done.
>
> Are there any options for running a dual-master setup, or other ways to
> spread the load across multiple machines?
>
> Has anyone patched nsca to submit check results into the checkresults
> directory, instead of via the nagios.cmd pipe? What kind of improvement
> can one expect from that?
>
> Any other advice?
--
Mike Lindsey
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list