Setting up a passive check problem
Lewis Getschel
lgetschel at denver.westerngeco.slb.com
Thu Apr 14 01:46:35 CEST 2005
Thanks to everyone for your input! (Ain't it great when we all help each
other!)
I've finally 'solved' my passive check issues.
To summarize the fixes:
1) _A_ big issue was that in nagios.cfg it WAS set to
"accept_passive_service_checks=0", So all _MY_ entries were being
ignored. (bad computer! <smirk>)
(even though my service had them enabled, the "system" wasn't
accepting them because of that)
accept_passive_service_checks=1 Now it accepts them (good computer!)
2) State retention may very well have been in issue, but losing all that
data whenever I made a change to the configs and restarted (and getting
the 27 "down" notices made me re-think this ) so I keep it.
3) I can't explain _why_ nagios wanted to execute my command even though
set to "active_checks_enabled 0", but setting the
"check_period none" solved that.
check_period none
Now it doesn't schedule any checks (I just have that 'annoying' RED
"5 services disabled" on the Tactical Overview page, I can live with that)
My current working service definition is:
define service{
use linux-service
name ibm_diskarray_status
service_description ibm_diskarray_status
active_checks_enabled 0
passive_checks_enabled 1
check_command check_dummy
retain_status_information 1
check_period none
register 0
}
Now Nagios is doing what I want it to do. (YEA!!!)
On a side note....
Can someone explain the idea of "register". As far as I can tell, since
I have "register 0" in my templates, nothing is registered. When would
I want to register something, and what does it get me?
When I _try_ to register this service (as above) (with a 1), when I
reload, I get:
Error: Service description, host name, or check command is NULL
Error: Could not register service (config file
'/usr/local/nagios/etc/general/services.cfg', line 345)
When I change back to zero, it reloads fine...
... Just wondering what I'm "missing".
Thanks again to all.
Marc Powell wrote:
>
>
>>-----Original Message-----
>>From: Lewis Getschel [mailto:lgetschel at denver.westerngeco.slb.com]
>>Sent: Tuesday, April 12, 2005 5:08 PM
>>To: Marc Powell
>>Cc: Nagios Users
>>Subject: Re: [Nagios-users] Setting up a passive check problem
>>
>>Sorry to describe so much and then leave out my actual problem...
>>
>>Being an impatient person I've changed my services.cfg a little... now
>>they are:
>>
>>services.cfg:
>>define service{
>> use linux-service
>> name ibm_disk_array_status
>> service_description ibm_disk_array_status
>> active_checks_enabled 0
>> passive_checks_enabled 1
>> check_command check_dummy
>> check_freshness 0
>> register 0
>> }
>>
>>same config- hosts.cfg:
>># service definition
>>define service{
>> use ibm_disk_array_status
>> host_name fs004,fs005,fs006,fs007,fs008
>>}
>>
>>commands.cfg:
>># 'check_dummy' command definition
>>define command{
>> command_name check_dummy
>> command_line $USER1$/check_dummy 0
>> }
>>
>>
>
>Yup. Still looks ok.
>
>
>
>>Now, If I understand ...
>>the idea of "active_checks_enabled 0", means do NOT
>>actually check anything (don't run the command_line defined).
>>the idea of "passive_checks_enabled 1" means that nagios
>>will only get updates that I put into the command_file
>>("/usr/local/nagios/var/rw/nagios.cmd") through another script that is
>>
>>
>
>Correct. Freshness checking will ignore the value of
>active_checks_enabled I believe. That would only come into play if
>you've enabled freshness checking of course.
>
>
>
>>called. This much IS working because I see the following line in my
>>event log:
>>[04-12-2005 14:57:15] EXTERNAL COMMAND:
>>PROCESS_SERVICE_CHECK_RESULT;fs008;ibm_disk_array_status;0;OK - No
>>errors reported
>>
>>
>>
>>
>
>This indicates that nagios saw an external command, not necessarily that
>it accepted it. I'm going to guess it did as the next line would have
>been an error of some type if nagios rejected it.
>
>
>
>>When I look at the scheduling queue it shows that my service
>>"ibm_disk_array_status" is scheduled to be run!
>>fs004 ibm_disk_array_status 04-12-2005 14:34:16 04-12-2005
>>14:54:16 ENABLED
>>
>>When I view my fileserver services, it shows:
>>fs004 ibm_disk_array_status OK 04-12-2005 14:34:16 0d 1h 33m
>>37s 1/4 Status is OK
>>
>>The problem is that the "Status is OK" message is coming from the
>>check_dummy command, and it _SHOULD_ be "OK - No errors reported" as
>>
>>
>my
>
>
>>external command shows.
>>
>>
>
>This could be explained if you have state retention enabled in
>nagios.cfg. See the notes on Retention at
>http://nagios.sourceforge.net/docs/1_0/xodtemplate.html.
>
>
>
>>------------I've done the following commands:---------------
>> $ sudo /etc/rc.d/init.d/nagios stop
>>Stopping network monitor: nagios
>>$ ps -ef | grep nagios | grep -v grep
>>$ sudo /etc/rc.d/init.d/nagios start
>>Starting network monitor: nagios
>> PID TTY TIME CMD
>>30767 ? 00:00:00 nagios
>>$ ps -ef | grep nagios | grep -v grep
>>nagios 30767 1 8 15:05 ? 00:00:00
>>/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
>>$
>>
>>
>>
>-----------------------------------------------------------------------
>
>
>>So I don't have an extra copy of nagios running.
>>
>>
>
>Good thinking. It's a common problem.
>
>
>
>>Here is what I want to happen:
>>1) tell nagios to accept passive results for these 5 servers, display
>>the last known status value it had for the service
>>
>>
>
>Looks like you've got that configured properly.
>
>
>
>>2) don't perform any active checks for whatever I need to specify as a
>>command
>>
>>
>
>Again, it looks like you have that configured properly.
>
>
>
>>3) When my script places a status of OK, or CRITICAL (the only 2
>>
>>
>cases),
>
>
>>accept that as the new status value, and notify as appropriate
>>until/unless the status is changed or the service is acknowledged.
>>
>>
>
>This will happen as a natural occurrence of submitting passive checks.
>
>
>
>>4) repeat
>>
>>After all this time, I thought I understood the basic operation of
>>Nagios, but it doesn't seem that I do.
>>
>>
>
>You're close. I'll bet it's state retention that's throwing you, based
>on the information so far.
>
>
>
>>(If someone has example configs for a passive service, could you
>>
>>
>please
>
>
>>post your file entries so I can see how someone else does it)
>>
>>
>
>Here's how I do it. Note that I have active checks enabled but the
>check_period to none. That prevents the annoying X from being displayed
>in the GUI but the command still never gets run as an active check.
>
># Generic service definition template
>define service{
> name generic-service
> active_checks_enabled 1 ; Active service checks
>are enabled
> passive_checks_enabled 1 ; Passive service checks
>are enabled/accepted
> parallelize_check 1 ; Active service checks
>should be parallelized
> obsess_over_service 0 ; We should obsess over
>this service (if necessary)
> check_freshness 0 ; Default is to NOT
>check service 'freshness'
> notifications_enabled 1 ; Service notifications
>are enabled
> event_handler_enabled 1 ; Service event handler
>is enabled
> flap_detection_enabled 1 ; Flap detection is
>enabled
> process_perf_data 0 ; Process performance
>data
> retain_status_information 1 ; Retain status
>information across program restarts
> retain_nonstatus_information 1 ; Retain non-status
>information across program restarts
> is_volatile 0
> check_period none
> max_check_attempts 4
> normal_check_interval 5
> retry_check_interval 3
> notification_interval 10080
> notification_period none
> notification_options c,r
>
> register 0 ; DONT REGISTER THIS
>DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
> }
>
># Host definition
>define host {
> use generic-host
> host_name host-name
> alias The Renaissance Center
> address <ip address removed>
> }
>
>#Service definition
>define service {
> use generic-service
> host_name host-name
> service_description PING
> contact_groups tnops
> check_command check_ping
> }
>
># 'check_ping' command definition
>define command{
> command_name check_ping
> command_line $USER1$/check_ping $HOSTADDRESS$ 30 60 500.0
>1000.0 -p 10 -t 30
> }
>
>--
>Marc
>
>
--
Lewis Getschel | Today is done...
WesternGeco | Today was fun...
1625 Broadway | Tomorrow is another one.
Denver, CO 80202 |
Direct Phone - 303-389-4407| -- Dr. Seuss --
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list