Design question
Sean McAfee
smcafee at collaborativefusion.com
Thu Jul 31 17:12:17 CEST 2008
Michael Weiner wrote:
> HMMMMMMM now you've peaked my interest. Anything you can share before
> i start building? I like the idea and wouldnt mind implementing a
> similar solution
>
> Michael
I just spent a while trying to come up with a comprehensive quick
explanation, but it's just not possible. The internal documentation for
the system design is something like 15+ pages, the majority of which
contains data that needs to be thoroughly sanitized. This is the meat
of it though and should give you an idea of what things need to be
considered. Feel free to ask questions about how I solved specific
problems or suggest ways to improve it.
Here is the repo layout:
| root -- All non-object configs (nagios.cfg, cgi.cfg, resources.cfg,
nsca.cfg, etc...)
|-- config -- All object configs (notify_cmds.cfg, templates, etc..).
| `-- contacts
|-- htpasswd
|-- scripts -- All shell scripts (event handlers, self-promotion, etc...)
| `-- checks -- Custom checks not found in the FreeBSD nagios-plugin port
`-- targets -- All hosts and services
|-- exemptions -- See step 2.1 below - removes some "global" checks
from individual facilities
| |-- facil0
| |-- facil1
| `-- facil2
|-- global -- Checks to be run from ALL facilities
|-- facil0 -- Checks for the slave instance at facilx
|-- facil1
`-- facil2
In order to comply with the automation requirements, a handful of DNS
entries had to be created at each slave facility:
nagios-host.[facil].example.com
This is a CNAME to the slave instance at each facility. It is used
as the destination target for rsyncing configs.
nagios-master.[facil].example.com
This is a CNAME to the master server. Due to the distributed nature
of our setup and Nagios' use of hostnames as unique identifiers, this
was required to give each slave server a unique target to monitor for
self-promotion purposes.
The svn post-commit script does the following from the master instance:
1. Checks out the newest version of the repo to /var/tmp/nagios/staging
2. Creates directories at
/var/tmp/nagios/[master|facil0|facil1|facil2] and rsyncs the repo into
each one
1. During this step, '.svn' is excluded and the -f option is
used to specify exclusions for slaves: rsync -avz --delete-before
--exclude=.svn
--exclude-from=$STAGING_DIR/targets/exemptions/${this_facil}
$STAGING_DIR/ ./${this_facil}
3. Moves nagios-hq.cfg or nagios-slave.cfg (as appropriate) to nagios.cfg
4. Uses grep & sed to perform search & replace on "magic" words:
* FACIL_PLACEHOLDER: maximizes portability and automation of
configs (examples: nagios-slave.cfg references
cfg_dir=targets/FACIL_PLACEHOLDER to eliminate need for
hand-manipulation; the "from" address in email is set to
nagios_FACIL_PLACEHOLDER@; check_snmpagent!FACIL_PLACEHOLDER_[common
suffix]; etc...);
* FACIL_ROLE: dynamically adjusts service_templates.cfg;
necessary to get the master instance to schedule active checks on ONLY
his local checks (Nagios slaves, its own nsca daemon, its gsm modem);
sets 0 for master, 1 on slaves
5. Slaves without GSM capabilities only -
[host|service]_notification_commands=notify-[host|service]-by-sms to
notify-[host|service]-by-epager.
6. Performs a local Nagios config validation for each facility
(nagios -v /var/tmp/nagios/{facil})
7. Rsyncs /var/tmp/nagios/{facil} to
nagios-host.[facil].example.com:/usr/local/etc/nagios/
1. $RSYNC -avz --delete-before $STAGING_ROOT/$this_facil/
nagios-host.[facil].example.com:/usr/local/etc/nagios/
8. Peforms a remote Nagios config validation on each system
9. Reloads Nagios via the rc script on each server
Self-promoption is done via an event handler script that echos
ENABLE_NOTIFICATIONS, STOP_OBSESSING_OVER_HOST_CHECKS,
STOP_OBSESSING_OVER_SVC_CHECK into the external command file should it
lose contact with the Master instance. Self-demotion is simply the
inverse of that.
Sean McAfee
System Engineer
Collaborative Fusion, Inc.
smcafee at collaborativefusion.com
412-422-3463 x 4025
5849 Forbes Avenue
Pittsburgh, PA 15217
****************************************************************
IMPORTANT: This message contains confidential information
and is intended only for the individual named. If the reader of
this message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list