Distributed nagios problem - service definition not found!
Jan Scholten
Jan.Scholten at iconz.net
Thu Oct 7 23:15:06 CEST 2004
As far as i know every host and every Service needs to be in the central
hosts.cfg/Services.cfg (even those, that are not actively checked), so
they are displayed.
So you need the "Check Host Alive" (which is in the default config named
PING) to be configured for acdmz-inside-sw2. Passive checks must be
enabled for this servicecheck active checks should be disabled:
active_checks_enabled 0
passive_checks_enabled 1
You can easyliey see what is missing:
> EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;acdmz-inside-sw2;Check
> Host Alive;0;PING OK - Packet loss = 0%, RTA = 0.83 ms
> Oct 6 15:02:44 noc-mon nagios: Warning: Message queue contained
> results for service 'Check Host Alive' on host 'acdmz-inside-sw2'. The
> service could not be found!
You need the service "Check Host Alive" defined for Host
"acdmz-inside-sw2" which does noit seem to work Have you tried using
Service Desriptions without Blanks? --> Change to Check_Host_Alive ?
YOU NEED TO DEFINE ALL HOSTS on the Central Server as well (you use
check_dummy 0 as Host check or set check_poeriod to none)
Jan
> Jan,
> Here are the lines from the services.cfg of the distributed server:
> define service {
> host_name localhost
> service_description cpu
> check_command check_local_load!3!5
> use generic-service
> max_check_attempts 3
> normal_check_interval 3
> retry_check_interval 1
> check_period 24x7
> notifications_enabled 0
> notification_interval 0
> notification_period 24x7
> notification_options w,u,c,r
> contact_groups admins
> }
> define service {
> hostgroup_name ACDMZ_Switches,ACDMZ_Firewalls
> service_description Check Host Alive
> check_command check-host-alive
> max_check_attempts 3
> normal_check_interval 5
> retry_check_interval 1
> check_period 24x7
> notification_interval 0
> notification_period 24x7
> notification_options w,u,c,r
> notifications_enabled 1
> contact_groups noc
> }
> My check_command is check-host-alive and not ping. Funny thing is that
> when the localhost cpu sends its checks, it seems to work. Although, I
> still don't know what to look for on the central server. Should I see
> some new hosts being added or does it only alarm when it fails? Do I
> also have to add the hosts to the central server? I only have the hosts
> in the ACDMZ_Switches defined on the distributed server. Just curious
> how we get notified of problems from the distributed server. I have a
> couple devices that I cannot reach via ping (check-host-alive) and they
> still never show as down on the central server gui.
> Thanks for the help.
> --------------------------------------
> Tim Moore
> DNS/Linux/Cisco Admin
> ODJFS
>
>>>> "Jan Scholten" <Jan.Scholten at iconz.net> 10/6/2004 4:51:05 PM >>>
>
> Can you supply the relevant part of services.cfg?
>
> It seems you have a misconfiguration. Are you sure the service is Check
> Host Alive and not PING (like default)?
> I don't know whether Nagios likes a servie_name with a blank, so try it
> without!
> So the return value ServiceName("Check Host Alive" in your case) must be
> the same as your service_description in the services.cfg for that host.
>
>
> Jan
>
>> I just recently setup distributed nagios. I followed the directions
>> very closely. I first had a problem running the nsca daemon through
>> xinetd. It just wouldn't listen for incoming on 5667. I added the line
>> to /etc/services also. Here is my config:
>> service nsca
>> {
>> flags = REUSE
>> socket_type = stream
>> wait = no
>> user = nagios
>> group = nagios
>> server = /usr/local/nagios/bin/nsca
>> server_args = -c /usr/local/nagios/etc/nsca.cfg
>> log_on_failure += USERID
>> disable = no
>> only_from = 10.12.225.50
>> }
>>
>> If I run it from command line in daemon mode it works fine.
>> My main problem, is that when passive checks are sent to the central
>> server I keep getting this error:
>> Oct 6 15:02:28 noc-mon nsca[31620]: Connection from 10.12.225.50 port
>> 38784
>> Oct 6 15:02:28 noc-mon nsca[31620]: Host address checks out ok
>> Oct 6 15:02:28 noc-mon nsca[31620]: Handling the connection...
>> Oct 6 15:02:29 noc-mon nsca[31620]: SERVICE CHECK -> Host Name:
>> 'localhost', Service Description: 'cpu', Return Code: '0', Output: 'OK -
>> load average: 0.00, 0.00, 0.00'
>> Oct 6 15:02:29 noc-mon nsca[31620]: End of connection...
>> Oct 6 15:02:30 noc-mon nagios: EXTERNAL COMMAND:
>> PROCESS_SERVICE_CHECK_RESULT;localhost;cpu;0;OK - load average: 0.00,
>> 0.00, 0.00
>> Oct 6 15:02:39 noc-mon nsca[31817]: Connection from 10.12.225.50 port
>> 39040
>> Oct 6 15:02:39 noc-mon nsca[31817]: Host address checks out ok
>> Oct 6 15:02:39 noc-mon nsca[31817]: Handling the connection...
>> Oct 6 15:02:40 noc-mon nsca[31817]: SERVICE CHECK -> Host Name:
>> 'acdmz-inside-sw2', Service Description: 'Check Host Alive', Return
>> Code: '0', Output: 'PING OK - Packet loss = 0%, RTA = 0.83 ms'
>> Oct 6 15:02:40 noc-mon nsca[31817]: End of connection...
>> Oct 6 15:02:40 noc-mon nagios: EXTERNAL COMMAND:
>> PROCESS_SERVICE_CHECK_RESULT;acdmz-inside-sw2;Check Host Alive;0;PING OK
>> - Packet loss = 0%, RTA = 0.83 ms
>> Oct 6 15:02:44 noc-mon nagios: Warning: Message queue contained
>> results for service 'Check Host Alive' on host 'acdmz-inside-sw2'. The
>> service could not be found!
>>
>> The localhost check acts like it works, but the simple check-host-alive
>> service definition is not. I know that that service definition is on
>> both servers. They are both running v1.2. Also, should I see something
>> on my central server's web gui showing these hosts down? My host count
>> has not been affected at all by the hosts added to the distributed
>> server. Am I missing something? Is there something wrong with the
>> default check-host-alive service check?
>> Thanks for any help,
>> --------------------------------------
>> Tim Moore
>> DNS/Linux/Cisco Admin
>> ODJFS
>>
>
>
>
--
Jan Scholten
Research and Development Intern
Iconz.co.nz
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list