retention issue

Tedman Eng teng at dataway.com
Fri Nov 18 21:51:39 CET 2005


I don't think "current check attempt #" is retained

Since soft states are not considered 'real' errors yet, when a nagios
restart occurs it must count up from the beginning again.

This is my understanding, though only from personal experience, not from
docs I've read somewhere.  I agree it would be useful to retain soft states
as well.

(maybe changed in 2.0, I haven't migrated yet so I don't know)


-----Original Message-----
From: Lori Adams [mailto:ladams at cloudmark.com]
Sent: Friday, November 18, 2005 11:41 AM
To: Tedman Eng; nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] retention issue


Are you saying that soft states are not retained?  Is this in the docs?
Everything I read says, status/states are retained.  

Soft states are just as important as hard states, in my opinion.

-Lori

> -----Original Message-----
> From: Tedman Eng [mailto:teng at dataway.com]
> Sent: Friday, November 18, 2005 11:25 AM
> To: Lori Adams; nagios-users at lists.sourceforge.net
> Subject: RE: [Nagios-users] retention issue
> 
> This is related to your max_check_attempts setting.
> 
> If the service hasn't reached the max_checks yet, it's still 'soft'
state.
> Once it hits the max_checks, it'll be hard state (and will get
retained
> between restarts)
> 
> 
> 
> -----Original Message-----
> From: Lori Adams [mailto:ladams at cloudmark.com]
> Sent: Friday, November 18, 2005 10:36 AM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] retention issue
> 
> 
> Nagios 1.2
> Linux
> 
> I'm using a couple of templates for this particular check.  There are
many
> services checks using this template.  When one of these checks becomes
> critical, the status in status.log changes to say it's critical.  If I
> stop/start nagios, then the status saved in status.sav is incorrect,
and
> says "No data yet (service was in a soft problem state during state
> retention)".
> 
> Here are the templates, before everyone tells me to turn on state
> retention:
> define service{
>         name                            generic-service-template
>         ...
>         retain_status_information       1       ; Retain status
> information
> across program restarts
>         retain_nonstatus_information    1       ; Retain non-status
> information across program restarts
>         ...
>         register                        0       ; DONT REGISTER THIS
> DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
>         }
> 
> define service {
>         use                             generic-service-template
>         name                            server-template
>         host_name                       server
>         contact_groups                  admins
>         register                        0
>         }
> 
> define service {
>         use                             server-template
>         name                            server-spool-template
>         normal_check_interval           60
>         retry_check_interval            30
>         check_period                    workhours_with_weekend
>         register                        0
>         }
> 
> define service {
>         use                             server-spool-template
>         service_description             check
>         check_command                   check_spool_nrpe!"-d
> /srv/smtp/Maildir/check -w 24hours -c 36hours -m 35000 -W 10000000 -C
> 20000000"
>         }
> 
> From nagios.cfg:
> retain_state_information=1
> retention_update_interval=60
> use_retained_program_state=1
> 
> I ran these commands all immediately one after the other, to show what
is
> happening.
> 
> root at aspire(var)# date; grep check status.log; /etc/init.d/nagios-prod
> stop;
> date; grep check status.sav; /etc/init.d/nagios-prod start; date; grep
> check
> status.log
> Fri Nov 18 10:23:24 PST 2005
> [1132338202]
> SERVICE;server;spool-
> check;CRITICAL;1/4;SOFT;1132338029;1132339829;ACTIVE;1;
>
1;1;1132338037;0;OK;4225413;0;0;0;0;0;1;3;0;1;0;0.00;0;1;1;1;/srv/smtp/M
ai
> ld
> ir/check last modified 11/14/05 16:49:00
> 
> Stopping network monitor: nagios
> Fri Nov 18 10:23:24 PST 2005
> Starting network monitor: nagios
> 21897 ?        00:00:00 nagios-prod
> 
> Fri Nov 18 10:23:26 PST 2005
> [1132338205]
> SERVICE;server;spool-
> check;OK;1/4;HARD;1132338029;1132338377;ACTIVE;1;1;1;11
> 32338037;0;OK;4225581;0;0;0;0;0;1;0;0;1;0;0.00;0;1;1;1;No data yet
> (service
> was in a soft problem state during state retention)
> 
> This is only happening when the checks using server-spool-template are
in
> a
> critical state.
> 
> Thanks,
> -Lori


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list