[PATCH] reduce notification load; fix $NOTIFICATIONRECIPIENTS$ macro #98
Michael Friedrich
michael.friedrich at univie.ac.at
Tue Nov 22 01:02:18 CET 2011
On 21.11.2011 20:56, Andreas Ericsson wrote:
> On 11/01/2011 02:05 PM, Michael Friedrich wrote:
>> hi,
>>
>> recently we've been debugging on team icinga in the middle of
>> notifications and macros, and while investigating on a users problem,
>> we've digged a bit deeper into the notification viability checks,
>> resulting in deeper analysis of an Opsview patch to reduce the
>> notification load significantly by moving the viability checks from
>> the actual notification into the creation of the contacts notified,
>> passing only a list of 'qualified' contacts to the actual
>> notification logic. the only thing to remark over here is that the
>> checks against the valid notification_period now happen sooner, and
>> not actually when the notification is sent to each contact.
>>
>> while implementing that patch into current code (needs some macro
>> passing with current code), we did remember nagios bug #98 where the
>> $NOTIFICATIONRECEIPIENTS$ macro is demanded to be only populated with
>> the actual contacts to be notified, but not all of those assigned to
>> the host/service. while this is considered to be a real bug, further
>> investigation showed that thanks to the viability checks before
>> calling add_notification(), contacts won't be added to that macro as
>> the macro logic happens within that function too.
>>
>> so by applying the attached git patch, you will a. reduce
>> notification load and b. fix the $NOTIFICATIONRECEIPIENTS$ macro
>> holding all contacts, but not the viable contacts.
>>
>> since the code remains actually the same on icinga and nagios in this
>> stage, the tests can be found at the icinga dev tracker as usual.
>> https://dev.icinga.org/issues/1744
>> https://dev.icinga.org/issues/2023
>>
>
> I've started looking into this patch right now. It's good to get that
> issue (#98) fixed, but I fail to see any noticeable performance
> improvement. All contacts potentionally viable for being contacted are
> still looked at, but the difference with this patch is that it checks
> the viability before shipping it off to add_notification(), which does
> fix issue 98 but at the expense of quite a lot of code duplication.
normally, all contacts would have been added to the notification_list in
memory, even those not actually passing the viability checks. but at
this stage of the code, nobody is aware of that so the list gets
populated either way by calling add_notification().
/* add all individual contacts for this host */
^^^
having that notification_list created, this remains fully linked in
memory. let's say, you have a bunch of some 1k contacts for that
service, and actually the alarm would hit only those in the nonworkhours
or workhours timeperiod and only on critical, for the ops team e.g.
so by looping through the notification_list, you will encounter *all*
contacts for that host, only the duplicates have been removed.
/* notify each contact (duplicates have been removed) */
then you'll fire up the actual notification with calling
notify_contact_of_host - and actually in there, the current core checks
the viability for the contact to be notified.
you are right, if each contact gets notified 24x7 on all
notification_options, the algorithm stays the same. but if you happen to
have a lot of different contacts assigned to hosts and services, not
getting notified each time a notification is triggered, the overall
amount of looping through notification_list will be shorter and save
some cpu cycles, and probably on larger systems, a bit more than just
some as this means a reduction of the looping for each contact to be
checked to be notified on the actual end-of-the-line.
furthermore, where do you get the idea of code duplication from? the
only changes made by this patch is actually moving the viability checks
and therefore passing an additional function parameter which makes the
diff a bit more bloated than it should be.
>
> I'll see if I can improve on that a bit.
>
>> kudos to Opsview Team for their initial patch as well as Icinga
>> Development Team for the further analysis on the macro bug.
>>
>> feel free to apply, matches against latest HEAD.
>>
>
> Not anymore, but I get the gist of it so I'll do something with it that
> keeps the spirit of the patch, as it were.
>
--
DI (FH) Michael Friedrich
Vienna University Computer Center
Universitaetsstrasse 7 A-1010 Vienna, Austria
email: michael.friedrich at univie.ac.at
phone: +43 1 4277 14359
mobile: +43 664 60277 14359
fax: +43 1 4277 14338
web: http://www.univie.ac.at/zid
http://www.aco.net
Lead Icinga Core Developer
http://www.icinga.org
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Developers
mailing list