What the...

Bishop, Dean dean.bishop at tcdsb.org
Thu Oct 10 21:39:53 CEST 2002


yeah, everything is set to 1.  It seems (someone please correct me if i am
wrong) that only state changes are logged in nagios.log.
 
i have just got performance data working.
 
funny, i half-heartedly tried to get this working about a week ago and gave
up for more pressing issues (yes process_perfomance_data=1 and
service/host_performance_commands where defined).  Then, when i was poking
around today, i noticed the link on the "Process Info" page to enable
performance data.  Hey now it works!!??
 
whatever.
 
at least now i can see every check though.
 
later,
dean

-----Original Message-----
From: Russell Scibetti [mailto:russell at quadrix.com]
Sent: Thursday, October 10, 2002 3:32 PM
To: Bishop, Dean
Cc: 'nagios-users at lists.sourceforge.net'
Subject: Re: [Nagios-users] RE: What the...


Your logic is right...service notification will never occur when the host
also fails.  You will only get host notifications.

Also, can you include what your Log settings are in the nagios.cfg
(log_initial_states, log_service_retries, etc.)  May be part of the logging
problem.  Might want to make them all 1 for now to make sure everything gets
logged, if they're not all set to 1 now.

-Russell

Bishop, Dean wrote:


you know...now that i am paying attention, i don't get many service
notifications.  Hardly any as a matter of fact.  Lots of Host notifications
though.
 
why would this be?  i see services being checked.
 
please, someone confirm for me:  a service fails, host check is done, host
fails, services are no longer checked for that host, so max_check_attempts
for the services is never reached, no service notification is sent?
 
almost all of my devices are either up or down.  very rarely does just a
service fail. services currently defined are pretty much just port-23 checks
for switches and port 135 checks for windoze servers.
 
later,
dean

-----Original Message-----
From: Jolet, John [ mailto:John.Jolet at misyshealthcare.com
<mailto:John.Jolet at misyshealthcare.com> ]
Sent: Thursday, October 10, 2002 2:30 PM
To: ' nagios-users at lists.sourceforge.net
<mailto:nagios-users at lists.sourceforge.net> '
Subject: RE: [Nagios-users] RE: What the...


Can you include the bit of the config that shows the parent-child
relationships?

-----O [Jolet, John]  riginal Message-----
From: Bishop, Dean [ mailto:dean.bishop at tcdsb.org
<mailto:dean.bishop at tcdsb.org> ]
Sent: Thursday, October 10, 2002 1:14 PM
To: Bishop, Dean; ' nagios-users at lists.sourceforge.net
<mailto:nagios-users at lists.sourceforge.net> '
Subject: [Nagios-users] RE: What the...



First, sorry bout the subject i realize that it is inappropriate.  it does,
however capture my initial response.

We are in the midst of many nightmares concurrently: smoking servers,
irreplaceable data lost, network latency, cold lunch, sore finger, you know
the whole gambut at once.


apologies to all.

here is another entry from my logs.  Each host is dependant on the
previously numbered host (e.g. Marshall-McLuhan-0561SW2A_4-HS7 is the parent
of Marshall-McLuhan-0561SW2A_5-HS7 who is the parent of
Marshall-McLuhan-0561SW2A_6-HS7, etc.

why, once Marshall-McLuhan-0561SW2A_14-HS7 is determined to be UNREACHABLE
(due to the failure of Marshall-McLuhan-0561SW2A_4-HS7), is the service
checked on Marshall-McLuhan-0561SW2A_14-HS7?



[1034172479] HOST ALERT:
Marshall-McLuhan-0561SW2A_14-HS7;DOWN;SOFT;1;CRITICAL - Plugin timed out
after 18 seconds
[1034172516] HOST ALERT:
Marshall-McLuhan-0561SW2A_7-HS7;DOWN;SOFT;1;CRITICAL - Plugin timed out
after 18 seconds
[1034172552] HOST ALERT:
Marshall-McLuhan-0561SW2A_6-HS7;DOWN;SOFT;1;CRITICAL - Plugin timed out
after 18 seconds
[1034172588] HOST ALERT:
Marshall-McLuhan-0561SW2A_5-HS7;DOWN;SOFT;1;CRITICAL - Plugin timed out
after 18 seconds
[1034172624] HOST ALERT:
Marshall-McLuhan-0561SW2A_4-HS7;DOWN;SOFT;1;CRITICAL - Plugin timed out
after 18 seconds
[1034172644] HOST ALERT:
Marshall-McLuhan-0561SW2A_4-HS7;DOWN;HARD;2;CRITICAL - Plugin timed out
after 18 seconds
[1034172644] HOST NOTIFICATION:
nagiosadmin;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITIC
AL - Plugin timed out after 18 seconds
[1034172645] HOST NOTIFICATION:
Marco;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL -
Plugin timed out after 18 seconds
[1034172645] HOST NOTIFICATION:
Kevin-NonCritical;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;notify-by-epager;CRIT
ICAL - Plugin timed out after 18 seconds
[1034172645] HOST NOTIFICATION:
Kevin;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL -
Plugin timed out after 18 seconds
[1034172646] HOST NOTIFICATION:
Keith-NonCritical;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;notify-by-epager;CRIT
ICAL - Plugin timed out after 18 seconds
[1034172646] HOST NOTIFICATION:
Keith;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL -
Plugin timed out after 18 seconds
[1034172646] HOST NOTIFICATION:
Ben;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL -
Plugin timed out after 18 seconds
[1034172647] HOST ALERT:
Marshall-McLuhan-0561SW2A_5-HS7;UNREACHABLE;HARD;2;CRITICAL - Plugin timed
out after 18 seconds
[1034172647] HOST ALERT:
Marshall-McLuhan-0561SW2A_6-HS7;UNREACHABLE;HARD;2;CRITICAL - Plugin timed
out after 18 seconds
[1034172647] HOST ALERT:
Marshall-McLuhan-0561SW2A_7-HS7;UNREACHABLE;HARD;2;CRITICAL - Plugin timed
out after 18 seconds
[1034172647] HOST ALERT:
Marshall-McLuhan-0561SW2A_14-HS7;UNREACHABLE;HARD;2;CRITICAL - Plugin timed
out after 18 seconds
[1034172647] SERVICE ALERT: Marshall-McLuhan-0561SW2A_14-HS7;Port
Check-23;CRITICAL;HARD;1;Socket timeout after 10 seconds


-----Original Message-----
From: Bishop, Dean
Sent: Thursday, October 10, 2002 1:04 PM
To: ' nagios-users at lists.sourceforge.net
<mailto:nagios-users at lists.sourceforge.net> '
Subject: What the *&#( !!
Importance: High


Can someone explain this to me??


why in the world is the service for testserver01.tcdsb.org being checked
after the host has been determined down?
also why is the host being checked before the service??




[root at NMS var]# tail nagios.log -n 3000 |grep testserver01

[1034266896] HOST ALERT: testserver01.tcdsb.org;UP;HARD;1;(Host assumed to
be up)
[1034266896] SERVICE ALERT: testserver01.tcdsb.org;Misc Servers - Port Check
135;OK;HARD;1;TCP OK - 0 second response time on port 135
[1034267924] HOST ALERT: testserver01.tcdsb.org;DOWN;SOFT;1;CRITICAL -
Plugin timed out after 8 seconds
[1034267933] HOST ALERT: testserver01.tcdsb.org;DOWN;HARD;2;CRITICAL -
Plugin timed out after 8 seconds
[1034267933] HOST
NOTIFICATION:nagiosadmin;testserver01.tcdsb.org;DOWN;host-notify-by-email;CR
ITICAL - Plugin timed out after 8 seconds
[1034267934] HOST
NOTIFICATION:Keith;testserver01.tcdsb.org;DOWN;host-notify-by-email;CRITICAL
- Plugin timed out after 8 seconds
[1034267934] SERVICE ALERT: testserver01.tcdsb.org;Misc Servers - Port Check
135;CRITICAL;HARD;1;Socket timeout after 2 seconds
[1034268938] HOST ALERT: testserver01.tcdsb.org;UP;HARD;1;PING OK - Packet
loss = 0%, RTA = 0.61 ms
[1034268938] HOST
NOTIFICATION:nagiosadmin;testserver01.tcdsb.org;UP;host-notify-by-email;PING
OK - Packet loss = 0%, RTA = 0.61 ms
[1034268938] HOST
NOTIFICATION:Keith;testserver01.tcdsb.org;UP;host-notify-by-email;PING OK -
Packet loss = 0%, RTA = 0.61 ms
[1034268938] SERVICE ALERT: testserver01.tcdsb.org;Misc Servers - Port Check
135;OK;HARD;1;TCP OK - 0 second response time on port 135

[root at NMS var]#


-- 

Russell Scibetti

Quadrix Solutions, Inc.

http://www.quadrix.com <http://www.quadrix.com> 

(732) 235-2335, ext. 7038


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20021010/e456dc4a/attachment.html>


More information about the Users mailing list