Greetings,<br><br>We are using Nagios version 3.0.6 on Fedora core 9.<br><br>I was just looking for some ideas how do you guys monitor critical servers and services, what are the best practices etc.?<br><br>On a related note I just figured we have been missing a lot of alerts lately. Today we had to reboot couple of AIX servers which usually takes 5+ minutes. Interesting thing is we did not receive any notification for these servers. Below is the host configuration entry<br>
<br>define host{<br> name aix-server ; The name of this host template<br> use generic-host ; This template inherits other values from the generic-host template<br>
check_period 24x7 ; By default, Linux hosts are checked round the clock<br> check_interval 2 ; Actively check the host every 5 minutes<br> retry_interval 1 ; Schedule host check retries at 1 minute interval<br>
max_check_attempts 2 ; Check each Linux host 10 times (max)<br> check_command check-host-alive ; Default command to check aix hosts<br> notification_interval 10 ; Resend notifications every 2 hours<br>
notification_options d,u,r ; Only send notifications for specific host states<br> contact_groups aix-team ; Notifications get sent to the admins by default<br> register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!<br>
}<br><br>I was just wondering what do I need to change if: <br><br>a server goes down<br>nagios check after 1 minute, as usual, and finds the server is down<br>nagios checks again after a minute and finds the server is still down<br>
nagios sends notification and keep on sending notification after every 10 minutes until the server comes up again<br><br>I have checked nagios archives for check_interval, retry_interval and max_check_attempts and as a result I got totally confused.<br>
<br>Any help is much appreciated.<br><br>P.S. I request nagios developers to either change these options to something more meaningful or provide some real life examples. Apparently there are many users which have been confused by these options as seen in archives.<br>
<br>Thanks<br>