[naemon-users] unsubscribe
Paul M Dubuc
work at paul.dubuc.org
Tue Sep 6 00:15:29 CEST 2022
On 9/5/22 10:53 AM, Steve Traylen wrote:
> Hi,
>
> Been running a naemon/thruk instance since a few years now.
>
> As of the last 2 or 3 days my instance is now creating notifications
> for services and hosts that are in scheduled downtime.
>
> Not obvious that I have changed something.
>
> Logs are confusing however since they are perfect and do not
> demonstrate the problem.
>
> # A downtime was entered from 1 second ago to 1 year later.
> [1662374724] EXTERNAL COMMAND: SCHEDULE_SVC_DOWNTIME;my.example.org
> <http://my.example.org>;lbd;1662374723;1693910723;1;0;0;User lxman;roger
> appstatus destroy : Machine had been created more than 5184000 seconds ago
> [1662374724] SERVICE DOWNTIME ALERT: my.example.org
> <http://my.example.org>;lbd;STARTED; Service has entered a period of
> scheduled downtime
>
> # Indeed the downtime reports that the service is now suppressed for
> notification
> [1662374724] SERVICE NOTIFICATION SUPPRESSED: my.example.org
> <http://my.example.org>;lbd;Notifications about SCHEDULED DOWNTIME
> events blocked for this object.
>
> # Service gos bad.
> [1662375882] SERVICE ALERT: my.example.org
> <http://my.example.org>;lbd;CRITICAL;SOFT;1;SNMP CRITICAL - *-1*
> [1662376182] SERVICE ALERT: my.example.org
> <http://my.example.org>;lbd;CRITICAL;SOFT;2;SNMP CRITICAL - *-1*
> [1662376482] SERVICE ALERT: my.example.org
> <http://my.example.org>;lbd;CRITICAL;HARD;3;SNMP CRITICAL - *-1*
>
> # Indeed now at hard state and service is logged as notification
> suppressed as you would expect.
> [1662376482] SERVICE NOTIFICATION SUPPRESSED: my.example.org
> <http://my.example.org>;lbd;Notification blocked for object currently in
> a scheduled downtime.
>
> 1662376482 = Monday, 5 September 2022 13:14:42
>
>
> However a notification was totally sent out at this time.
>
> ```
> :fire: __PROBLEM__ my.example.org/lbd <http://my.example.org/lbd> is
> CRITICAL for 0d 0h 10m 1s. , lnodes/login, production, S513-C-VM960.
>
> SNMP CRITICAL - *-1* [Nag
> :link:](https://cernnag.example.ch/thruk/cgi-bin/extinfo.cgi?type=2&host=my.example.org&service=lbd
> <https://cernnag.example.ch/thruk/cgi-bin/extinfo.cgi?type=2&host=my.example.org&service=lbd>),
> [Monit
> :link:](https://monit-grafana.example.ch/d/RwtmMDXmz/single-host-metrics?orgId=1&var-hostname=my.example.org
> <https://monit-grafana.example.ch/d/RwtmMDXmz/single-host-metrics?orgId=1&var-hostname=my.example.org>),
> [Fore :link:](https://judy.example.ch/hosts/my.example.org
> <https://judy.example.ch/hosts/my.example.org>), [SSH
> :link:](ssh://root@my.example.org <mailto:root at my.example.org>)
> ```
>
> Service appears as in downtime on thruk interface. There is no
> naemon.log entry for that notification that went out.
>
> Only recent action was during some network instability of the naemon
> server itself I hit 'Disable all notifications' and then 'Enable all
> notifications'
> That said I have tried to remove all histroy in the service by stopping
> nameon and cleaning up all these files.
> * /var/lib/naemon/ objects.cache, retention.cache, status.dat.
> * /var/log/naemon naemon.log archives/*
>
> Any idea why notifications might still be being sent.
>
> Versions on CentOS 7:
> rpm -q naemon thruk naemon-livestatus
> naemon-1.3.1-0.noarch
> thruk-2.48.3-11458.1.x86_64
> naemon-livestatus-1.3.1-0.x86_64
>
> Many Thanks
>
> Steve.
>
> --
> Steve Traylen
More information about the Naemon-users
mailing list