Distributed nagios problem - service definition not found!

TIM MOORE MOORET10 at odjfs.state.oh.us
Fri Oct 8 14:02:11 CEST 2004


Thanks for your help Jan.  I added the hosts to the central server with checks_enabled set to 0.  Then I added the two hosts to a service and set passive_checks enabled and active_checks disabled (maybe I don't need both set).  And it worked.  It added both devices and they are now showing green from the data received by the external command file.  I used "Check Host Alive" as the service description to match the service description on the distributed server.  Thanks for the help.  I will now try to add some hosts through a firewall and we will see if there are any other complications.
 
--------------------------------------
Tim Moore
DNS/Linux/Cisco Admin
ODJFS

>>> "Jan Scholten" <Jan.Scholten at iconz.net> 10/7/2004 5:15:06 PM >>>

As far as i know every host and every Service needs to be in the central  
hosts.cfg/Services.cfg (even those, that are not actively checked), so  
they are displayed.

So you need the "Check Host Alive" (which is in the default config named  
PING) to be configured for acdmz-inside-sw2. Passive checks must be  
enabled for this servicecheck active checks should be disabled:
active_checks_enabled 0
passive_checks_enabled 1

You can easyliey see what is missing:

> EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;acdmz-inside-sw2;Check  
> Host Alive;0;PING OK - Packet loss = 0%, RTA = 0.83 ms
> Oct  6 15:02:44 noc-mon nagios: Warning:  Message queue contained  
> results for service 'Check Host Alive' on host 'acdmz-inside-sw2'.  The  
> service could not be found!

You need the service "Check Host Alive" defined for Host  
"acdmz-inside-sw2" which does noit seem to work Have you tried using  
Service Desriptions without Blanks? --> Change to Check_Host_Alive ?


YOU NEED TO DEFINE ALL HOSTS on the Central Server as well (you use  
check_dummy 0 as Host check or set check_poeriod to none)

Jan

> Jan,
> Here are the lines from the services.cfg of the distributed server:
> define service {
> host_name                      localhost
> service_description            cpu
> check_command                  check_local_load!3!5
> use                            generic-service
> max_check_attempts             3
> normal_check_interval          3
> retry_check_interval           1
> check_period                   24x7
> notifications_enabled          0
> notification_interval          0
> notification_period            24x7
> notification_options           w,u,c,r
> contact_groups                 admins
> }
> define service {
> hostgroup_name                 ACDMZ_Switches,ACDMZ_Firewalls
> service_description            Check Host Alive
> check_command                  check-host-alive
> max_check_attempts             3
> normal_check_interval          5
> retry_check_interval           1
> check_period                   24x7
> notification_interval          0
> notification_period            24x7
> notification_options           w,u,c,r
> notifications_enabled          1
> contact_groups                 noc
> }
> My check_command is check-host-alive and not ping.  Funny thing is that  
> when the localhost cpu sends its checks, it seems to work.  Although, I  
> still don't know what to look for on the central server.  Should I see  
> some new hosts being added or does it only alarm when it fails?  Do I  
> also have to add the hosts to the central server?  I only have the hosts  
> in the ACDMZ_Switches defined on the distributed server.  Just curious  
> how we get notified of problems from the distributed server.  I have a  
> couple devices that I cannot reach via ping (check-host-alive) and they  
> still never show as down on the central server gui.
> Thanks for the help.
> --------------------------------------
> Tim Moore
> DNS/Linux/Cisco Admin
> ODJFS
>
>>>> "Jan Scholten" <Jan.Scholten at iconz.net> 10/6/2004 4:51:05 PM >>>
>
> Can you supply the relevant part of services.cfg?
>
> It seems you have a misconfiguration. Are you sure the service is Check
> Host Alive and not PING (like default)?
> I don't know whether Nagios likes a servie_name with a blank, so try it
> without!
> So the return value ServiceName("Check Host Alive" in your case) must be
> the same  as your service_description in the services.cfg for that host.
>
>
> Jan
>
>> I just recently setup distributed nagios.  I followed the directions
>> very closely.  I first had a problem running the nsca daemon through
>> xinetd.  It just wouldn't listen for incoming on 5667.  I added the line
>> to /etc/services also.  Here is my config:
>> service nsca
>> {
>>         flags           = REUSE
>>         socket_type     = stream
>>         wait            = no
>>         user            = nagios
>>         group           = nagios
>>         server          = /usr/local/nagios/bin/nsca
>>         server_args     = -c /usr/local/nagios/etc/nsca.cfg
>>         log_on_failure  += USERID
>>         disable         = no
>>         only_from       = 10.12.225.50
>> }
>>
>> If I run it from command line in daemon mode it works fine.
>> My main problem, is that when passive checks are sent to the central
>> server I keep getting this error:
>> Oct  6 15:02:28 noc-mon nsca[31620]: Connection from 10.12.225.50 port
>> 38784
>> Oct  6 15:02:28 noc-mon nsca[31620]: Host address checks out ok
>> Oct  6 15:02:28 noc-mon nsca[31620]: Handling the connection...
>> Oct  6 15:02:29 noc-mon nsca[31620]: SERVICE CHECK -> Host Name:
>> 'localhost', Service Description: 'cpu', Return Code: '0', Output: 'OK -
>> load average: 0.00, 0.00, 0.00'
>> Oct  6 15:02:29 noc-mon nsca[31620]: End of connection...
>> Oct  6 15:02:30 noc-mon nagios: EXTERNAL COMMAND:
>> PROCESS_SERVICE_CHECK_RESULT;localhost;cpu;0;OK - load average: 0.00,
>> 0.00, 0.00
>> Oct  6 15:02:39 noc-mon nsca[31817]: Connection from 10.12.225.50 port
>> 39040
>> Oct  6 15:02:39 noc-mon nsca[31817]: Host address checks out ok
>> Oct  6 15:02:39 noc-mon nsca[31817]: Handling the connection...
>> Oct  6 15:02:40 noc-mon nsca[31817]: SERVICE CHECK -> Host Name:
>> 'acdmz-inside-sw2', Service Description: 'Check Host Alive', Return
>> Code: '0', Output: 'PING OK - Packet loss = 0%, RTA = 0.83 ms'
>> Oct  6 15:02:40 noc-mon nsca[31817]: End of connection...
>> Oct  6 15:02:40 noc-mon nagios: EXTERNAL COMMAND:
>> PROCESS_SERVICE_CHECK_RESULT;acdmz-inside-sw2;Check Host Alive;0;PING OK
>> - Packet loss = 0%, RTA = 0.83 ms
>> Oct  6 15:02:44 noc-mon nagios: Warning:  Message queue contained
>> results for service 'Check Host Alive' on host 'acdmz-inside-sw2'.  The
>> service could not be found!
>>
>> The localhost check acts like it works, but the simple check-host-alive
>> service definition is not.  I know that that service definition is on
>> both servers.  They are both running v1.2.  Also, should I see something
>> on my central server's web gui showing these hosts down?  My host count
>> has not been affected at all by the hosts added to the distributed
>> server.  Am I missing something?  Is there something wrong with the
>> default check-host-alive service check?
>> Thanks for any help,
>> --------------------------------------
>> Tim Moore
>> DNS/Linux/Cisco Admin
>> ODJFS
>>
>
>
>



-- 
Jan Scholten
Research and Development Intern
Iconz.co.nz


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20041008/697a5e8c/attachment.html>


More information about the Users mailing list