What the...

Russell Scibetti russell at quadrix.com
Thu Oct 10 21:32:22 CEST 2002


Your logic is right...service notification will never occur when the 
host also fails.  You will only get host notifications.

Also, can you include what your Log settings are in the nagios.cfg 
(log_initial_states, log_service_retries, etc.)  May be part of the 
logging problem.  Might want to make them all 1 for now to make sure 
everything gets logged, if they're not all set to 1 now.

-Russell

Bishop, Dean wrote:

> you know...now that i am paying attention, i don't get many service 
> notifications.  Hardly any as a matter of fact.  Lots of Host 
> notifications though.
>
>  
>
> why would this be?  i see services being checked.
>
>  
>
> please, someone confirm for me:  a service fails, host check is done, 
> host fails, services are no longer checked for that host, so 
> max_check_attempts for the services is never reached, no service 
> notification is sent?
>
>  
>
> almost all of my devices are either up or down.  very rarely does just 
> a service fail. services currently defined are pretty much just 
> port-23 checks for switches and port 135 checks for windoze servers.
>
>  
>
> later,
>
> dean
>
>     -----Original Message-----
>     From: Jolet, John [mailto:John.Jolet at misyshealthcare.com]
>     Sent: Thursday, October 10, 2002 2:30 PM
>     To: 'nagios-users at lists.sourceforge.net'
>     Subject: RE: [Nagios-users] RE: What the...
>
>     Can you include the bit of the config that shows the parent-child
>     relationships?
>
>         -----O [Jolet, John]  riginal Message-----
>         From: Bishop, Dean [mailto:dean.bishop at tcdsb.org]
>         Sent: Thursday, October 10, 2002 1:14 PM
>         To: Bishop, Dean; 'nagios-users at lists.sourceforge.net'
>         Subject: [Nagios-users] RE: What the...
>
>         First, sorry bout the subject i realize that it is
>         inappropriate.  it does, however capture my initial response.
>
>         We are in the midst of many nightmares concurrently: smoking
>         servers, irreplaceable data lost, network latency, cold lunch,
>         sore finger, you know the whole gambut at once.
>
>         apologies to all.
>
>         here is another entry from my logs.  Each host is dependant on
>         the previously numbered host (e.g.
>         Marshall-McLuhan-0561SW2A_4-HS7 is the parent of
>         Marshall-McLuhan-0561SW2A_5-HS7 who is the parent of
>         Marshall-McLuhan-0561SW2A_6-HS7, etc.
>
>         why, once Marshall-McLuhan-0561SW2A_14-HS7 is determined to be
>         UNREACHABLE (due to the failure of
>         Marshall-McLuhan-0561SW2A_4-HS7), is the service checked on
>         Marshall-McLuhan-0561SW2A_14-HS7?
>
>
>
>         [1034172479] HOST ALERT:
>         Marshall-McLuhan-0561SW2A_14-HS7;DOWN;SOFT;1;CRITICAL - Plugin
>         timed out after 18 seconds
>         [1034172516] HOST ALERT:
>         Marshall-McLuhan-0561SW2A_7-HS7;DOWN;SOFT;1;CRITICAL - Plugin
>         timed out after 18 seconds
>         [1034172552] HOST ALERT:
>         Marshall-McLuhan-0561SW2A_6-HS7;DOWN;SOFT;1;CRITICAL - Plugin
>         timed out after 18 seconds
>         [1034172588] HOST ALERT:
>         Marshall-McLuhan-0561SW2A_5-HS7;DOWN;SOFT;1;CRITICAL - Plugin
>         timed out after 18 seconds
>         [1034172624] HOST ALERT:
>         Marshall-McLuhan-0561SW2A_4-HS7;DOWN;SOFT;1;CRITICAL - Plugin
>         timed out after 18 seconds
>         [1034172644] HOST ALERT:
>         Marshall-McLuhan-0561SW2A_4-HS7;DOWN;HARD;2;CRITICAL - Plugin
>         timed out after 18 seconds
>         [1034172644] HOST NOTIFICATION:
>         nagiosadmin;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL
>         - Plugin timed out after 18 seconds
>         [1034172645] HOST NOTIFICATION:
>         Marco;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL
>         - Plugin timed out after 18 seconds
>         [1034172645] HOST NOTIFICATION:
>         Kevin-NonCritical;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;notify-by-epager;CRITICAL
>         - Plugin timed out after 18 seconds
>         [1034172645] HOST NOTIFICATION:
>         Kevin;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL
>         - Plugin timed out after 18 seconds
>         [1034172646] HOST NOTIFICATION:
>         Keith-NonCritical;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;notify-by-epager;CRITICAL
>         - Plugin timed out after 18 seconds
>         [1034172646] HOST NOTIFICATION:
>         Keith;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL
>         - Plugin timed out after 18 seconds
>         [1034172646] HOST NOTIFICATION:
>         Ben;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL
>         - Plugin timed out after 18 seconds
>         [1034172647] HOST ALERT:
>         Marshall-McLuhan-0561SW2A_5-HS7;UNREACHABLE;HARD;2;CRITICAL -
>         Plugin timed out after 18 seconds
>         [1034172647] HOST ALERT:
>         Marshall-McLuhan-0561SW2A_6-HS7;UNREACHABLE;HARD;2;CRITICAL -
>         Plugin timed out after 18 seconds
>         [1034172647] HOST ALERT:
>         Marshall-McLuhan-0561SW2A_7-HS7;UNREACHABLE;HARD;2;CRITICAL -
>         Plugin timed out after 18 seconds
>         [1034172647] HOST ALERT:
>         Marshall-McLuhan-0561SW2A_14-HS7;UNREACHABLE;HARD;2;CRITICAL -
>         Plugin timed out after 18 seconds
>         [1034172647] SERVICE ALERT:
>         Marshall-McLuhan-0561SW2A_14-HS7;Port
>         Check-23;CRITICAL;HARD;1;Socket timeout after 10 seconds
>
>
>         -----Original Message-----
>         From: Bishop, Dean
>         Sent: Thursday, October 10, 2002 1:04 PM
>         To: 'nagios-users at lists.sourceforge.net'
>         Subject: What the *&#( !!
>         Importance: High
>
>
>         Can someone explain this to me??
>
>
>         why in the world is the service for testserver01.tcdsb.org
>         being checked after the host has been determined down?
>         also why is the host being checked before the service??
>
>
>
>
>         [root at NMS var]# tail nagios.log -n 3000 |grep testserver01
>
>         [1034266896] HOST ALERT:
>         testserver01.tcdsb.org;UP;HARD;1;(Host assumed to be up)
>         [1034266896] SERVICE ALERT: testserver01.tcdsb.org;Misc
>         Servers - Port Check 135;OK;HARD;1;TCP OK - 0 second response
>         time on port 135
>         [1034267924] HOST ALERT:
>         testserver01.tcdsb.org;DOWN;SOFT;1;CRITICAL - Plugin timed out
>         after 8 seconds
>         [1034267933] HOST ALERT:
>         testserver01.tcdsb.org;DOWN;HARD;2;CRITICAL - Plugin timed out
>         after 8 seconds
>         [1034267933] HOST
>         NOTIFICATION:nagiosadmin;testserver01.tcdsb.org;DOWN;host-notify-by-email;CRITICAL
>         - Plugin timed out after 8 seconds
>         [1034267934] HOST
>         NOTIFICATION:Keith;testserver01.tcdsb.org;DOWN;host-notify-by-email;CRITICAL
>         - Plugin timed out after 8 seconds
>         [1034267934] SERVICE ALERT: testserver01.tcdsb.org;Misc
>         Servers - Port Check 135;CRITICAL;HARD;1;Socket timeout after
>         2 seconds
>         [1034268938] HOST ALERT: testserver01.tcdsb.org;UP;HARD;1;PING
>         OK - Packet loss = 0%, RTA = 0.61 ms
>         [1034268938] HOST
>         NOTIFICATION:nagiosadmin;testserver01.tcdsb.org;UP;host-notify-by-email;PING
>         OK - Packet loss = 0%, RTA = 0.61 ms
>         [1034268938] HOST
>         NOTIFICATION:Keith;testserver01.tcdsb.org;UP;host-notify-by-email;PING
>         OK - Packet loss = 0%, RTA = 0.61 ms
>         [1034268938] SERVICE ALERT: testserver01.tcdsb.org;Misc
>         Servers - Port Check 135;OK;HARD;1;TCP OK - 0 second response
>         time on port 135
>
>         [root at NMS var]#
>

-- 
Russell Scibetti
Quadrix Solutions, Inc.
http://www.quadrix.com
(732) 235-2335, ext. 7038


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20021010/1c592514/attachment.html>


More information about the Users mailing list