logging and report issues

Edgar Shine eshine at gmail.com
Fri Jul 29 22:04:08 CEST 2005


Greetings,

I´m trying to understand how can I adjust some report issues. Maybe
someone could help me.
I manage a WAN radio network and I´m using Nagios to monitor these
devices. Radio links often have some kind of problem in the RTT in rainy
days mainly, and I want to manage statistics of these problems.

The problem is when I have a alarm report in the first scheduled
polling, I didn´t have an recovery log. And I´m pretty sure the host
recovered status during the day, Monitoring show me the host service was OK.

Here is some lines in the log files, relating to the attached file.
---BEGIN---
root at phoenix:/usr/local/nagios/var/archives# grep castanheiras
nagios-07-07-2005-00.log
[1120705200] CURRENT HOST STATE: castanheiras;UP;HARD;1;PING OK - Packet
loss = 0%, RTA = 259.77 ms
[1120705200] CURRENT SERVICE STATE:
castanheiras;PING;CRITICAL;HARD;3;PING CRITICAL - Packet loss = 0%, RTA
= 288.31 ms
root at phoenix:/usr/local/nagios/var/archives# grep castanheiras
nagios-07-08-2005-00.log
[1120791600] CURRENT HOST STATE: castanheiras;UP;HARD;1;PING OK - Packet
loss = 0%, RTA = 376.73 ms
[1120791600] CURRENT SERVICE STATE: castanheiras;PING;OK;HARD;1;PING OK
- Packet loss = 0%, RTA = 134.32 ms
root at phoenix:/usr/local/nagios/var/archives# grep castanheiras
nagios-07-14-2005-00.log
[1121310000] CURRENT HOST STATE: castanheiras;UP;HARD;1;PING OK - Packet
loss = 0%, RTA = 114.10 ms
[1121310000] CURRENT SERVICE STATE:
castanheiras;PING;CRITICAL;HARD;3;PING CRITICAL - Packet loss = 0%, RTA
= 2439.42 ms
root at phoenix:/usr/local/nagios/var/archives# grep castanheiras
nagios-07-15-2005-00.log
[1121396400] CURRENT HOST STATE: castanheiras;UP;HARD;1;PING OK - Packet
loss = 0%, RTA = 199.86 ms
[1121396400] CURRENT SERVICE STATE:
castanheiras;PING;CRITICAL;HARD;3;PING CRITICAL - Packet loss = 0%, RTA
= 213.32 ms
root at phoenix:/usr/local/nagios/var/archives# grep castanheiras
nagios-07-16-2005-00.log
[1121482800] CURRENT HOST STATE: castanheiras;UP;HARD;1;PING OK - Packet
loss = 0%, RTA = 200.23 ms
[1121482800] CURRENT SERVICE STATE: castanheiras;PING;OK;HARD;1;PING OK
- Packet loss = 0%, RTA = 102.60 ms
---EOT---

Logging is working if we have some alarm during he day **if** we don´t
have a critical state in the current service state when Nagios rotate
the logfile:
---BEGIN---
root at phoenix:/usr/local/nagios/var/archives# grep castanheiras
nagios-07-29-2005-00.log | grep HARD
[1122519600] CURRENT HOST STATE: castanheiras;UP;HARD;1;PING OK - Packet
loss = 0%, RTA = 344.05 ms
[1122519965] SERVICE ALERT: castanheiras;PING;CRITICAL;HARD;5;PING
CRITICAL - Packet loss = 0%, RTA = 451.80 ms
[1122523904] SERVICE ALERT: castanheiras;PING;OK;HARD;5;PING OK - Packet
loss = 0%, RTA = 34.14 ms
[1122526189] SERVICE ALERT: castanheiras;PING;CRITICAL;HARD;5;PING
CRITICAL - Packet loss = 0%, RTA = 551.29 ms
[1122527084] SERVICE ALERT: castanheiras;PING;OK;HARD;5;PING OK - Packet
loss = 0%, RTA = 36.09 ms
---EOT---

My host.conf
---BEGIN---
define host{
         name                            host_template
         notifications_enabled           1
         event_handler_enabled           0
         flap_detection_enabled          1
         process_perf_data               1
         retain_status_information       1
         retain_nonstatus_information    1
         register                        0
         }

define host{
         use                     host_template           ; Name of host
template to use
         host_name               castanheiras
         alias                   Condominio Castanheiras
         address                 10.140.215.2
         parents                 cisco_nobhill
         check_command           check-host-alive
         max_check_attempts      5
         contact_groups          noc
         notification_interval   0
         notification_period     24x7
         notification_options    d,u,r
         }
---EOT---

My services.cfg
---BEGIN---
define service{
         name                            generic-service
         active_checks_enabled           1
         passive_checks_enabled          1
         parallelize_check               1
         obsess_over_service             1
         check_freshness                 0
         notifications_enabled           1
         event_handler_enabled           0
         flap_detection_enabled          1
         process_perf_data               1
         retain_status_information       1
         retain_nonstatus_information    1
         register                        0
         }

define service{
         use                             generic-service
         host_name                       castanheiras
         service_description             PING
         is_volatile                     0
         check_period                    24x7
         max_check_attempts              5
         normal_check_interval           5
         retry_check_interval            3
         contact_groups                  noc,redes,manutencao
         notification_interval           0
         notification_period             24x7
         notification_options            c,r
         check_command                   check_ping!199.99,79%!200.0,80%
         }
---EOT---

System info:
---BEGIN---
Nagios vs. : 2.0b3
Linux 2.4.29
Apache 1.3.33
gd 2.0.33
---END---

Any tips? Thanks in advance for your time.

rgds,
Edgar Shine

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ScreenShot004.jpg
Type: image/jpeg
Size: 201801 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20050729/61947126/attachment.jpg>


More information about the Users mailing list