Issues with 1.0b5

Brian Wilson wilson at ncsu.edu
Thu Aug 22 20:15:31 CEST 2002
Previous message: Issues with 1.0b5
Next message: Nagios is running, but . . .
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Yes, that's what I'm currently doing with netsaint.. and inserting
and removing commands from the external commands file.. I just thought
that this might have been someones wishlist for awhile and was hoping to
see it in nagios..



On Thu, 22 Aug 2002, Rusch, Daniel wrote:

> Brian,
>
> As to "bulk managing" groups of hosts I understand your point that it would
> be nice if this was do able out of the box. Have you checked out the
> external commands, you can write a simple script to do what you want.
>
> -----Original Message-----
> From: Brian Wilson [mailto:wilson at unity.ncsu.edu]
> Sent: Thursday, August 22, 2002 9:22 AM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] Issues with 1.0b5
>
>
>
> First off, I've been a longtime user of netsaint and my current network
> status system is based off of a combination of netsaint and hp openview.
> After learning about the nagios/netsaint spinoff, I decided to look into
> it because of some critical features that were missing from netsaint:
>
> First being the fact that nagios would retain host downtime on program
> restarts.  This is critical and I got around this in netsaint with a
> series of AT jobs.  I also like the config file changes.. I have scripts
> that do all configuration file generation, and having this change was well
> needed ( There is, however, a limitation as to how many devices you can
> add to a group.. try adding 2,000 hosts to a group and see what happens..
> group members should really be 1 per line instead of comma separated
>   member switch1
>   member switch2
>   member switch3
>   etc..
>  ).
>
> I've been convinced to do a re-write of our current system and decided to
> try nagios instead of netsaint, mainly because of downtime retention.  To
> test downtime retention, I setup a test device with a dummy host-alive
> check and 2 service checks, one being a ping:
>
> define service{
>         use             generic-service;
>         host_name       cisco-temp;
>         is_volatile     0;
>         check_period    24x7;
>         contact_groups  dummy-group;
>         notification_period     24x7;
>         notification_interval   180;
>         notification_options    c,r;
>         service_description     PING;
>         max_check_attempts      5;
>         normal_check_interval   8;
>         retry_check_interval    2;
>         event_handler   switch-down-event-handler;
>         check_command   check_ping;
>         }
>
> I setup a dummy-group to send email to ( because I don't want
> notifications of a host/service down being sent directly to an email
> address..  imagine if 100 hosts suddenly went down.. 100 emails )..
> instead, I always call an event handler to perform some action ( ie. allow
> host notifications to queue up and send them in bulk every 15 minutes or
> so. )
>
> 1st problem: I then add a host downtime entry for this device, unhook it
> from the network so the ping service check will fail, and I continue to
> get notifications from the event handler for that service.  (so, am I
> correct in assuming that just because you set a host downtime entry for a
> device that the service checks will keep sending notifications?  Why is
> this?  If a host is down, then services will be down)
>
> 2nd problem: I then add a service downtime entry for this device, unhook
> it from the network again so that the ping service check will fail, and I
> still continue to get notifications from the event handler for that
> service.  (so, am I correct in assuming that downtime for a service only
> affects email notifications and not event handler notifications?)
>
> 3rd problem: One problem with netsaint, which I still see in nagios, is
> the lack of a tool to bulk manage a number of devices.  (ie, if I know a
> building is going to loose power from 08:00 to 14:00, then I want to
> schedule downtime for all devices in that building.  With the current
> process of setting downtime, this is rather tedius.  I'll probably get
> around it by writing my own interface to do this (as I did with netsaint),
> but this would be a great feature to add to your wishlist.
>
> Question: assuming downtime scheduling works correctly, would it be
> possible to put downtime data into mysql and point 2 different nagios
> servers at the database (the servers would be monitoring the same devices,
> so, essentially, they would have the same downtime schedules.
>
> Thanks for listening.. I think nagios is a step in the right direction as
> far as network/host monitoring goes.
>
> Brian
>
> --
> Brian Wilson  <wilson at ncsu.edu>      Network Analyst
> Communication Technologies, ATD      W: 919.513.3472
> North Carolina State University      www.ncstate.net
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: OSDN - Tired of that same old
> cell phone?  Get a new here for FREE!
> https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
>


--
Brian Wilson  <wilson at ncsu.edu>      Network Analyst
Communication Technologies, ATD      W: 919.513.3472
North Carolina State University      www.ncstate.net



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
Previous message: Issues with 1.0b5
Next message: Nagios is running, but . . .
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list