Monitoring - 1000+ hosts and latency
David Parrish
david at dparrish.com
Mon Aug 11 00:16:13 CEST 2003
Patch attached...
On Fri, Aug 08, 2003 at 06:13:13PM +0000, solo molo wrote:
> From: "solo molo" <solomolo90 at hotmail.com>
> To: david at dparrish.com, Fred.Albrecht at za.tiscali.com
> Cc: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Monitoring - 1000+ hosts and latency
> Date: Fri, 08 Aug 2003 18:13:13 +0000
>
> I'm having the same problem. Where is this patch located?
>
>
> >From: David Parrish <david at dparrish.com>
> >To: Fred Albrecht <Fred.Albrecht at za.tiscali.com>
> >CC: nagios-users at lists.sourceforge.net
> >Subject: Re: [Nagios-users] Monitoring - 1000+ hosts and latency
> >Date: Thu, 7 Aug 2003 07:54:54 +1000
> >
> >I'm not going to guarantee anything. From what I saw, the external check
> >code that list patch applies to looks very similar in 1.0 & 1.1.
> >
> >Try it and see :)
> >
> >On Wed, Aug 06, 2003 at 07:33:31AM +0200, Fred Albrecht wrote:
> >
> > > Hi David
> > >
> > > Will this patch work on version 1.0 as well?
> > >
> > > :)
> > > fred
> > > > -----Original Message-----
> > > > From: David Parrish [mailto:david at dparrish.com]
> > > > Sent: 06 August 2003 01:32 AM
> > > > To: ftang at cgg.com
> > > > Cc: nagios-users at lists.sourceforge.net
> > > > Subject: Re: [Nagios-users] Monitoring - 1000+ hosts and latency
> > > >
> > > >
> > > > I'm running a nagios install monitoring nearly 7000 services and still
> > > > growing. It was unbearable until it was patched to not fork to process
> > > > external check results as often.
> > > >
> > > > If a large portion of your checks are passive, then this
> > > > patch will help a
> > > > lot. Probably not much for active checks though.
> > > >
> > > > My average check latency is 0.006 sec.
> > > >
> > > > On Tue, Aug 05, 2003 at 12:23:25PM +0100, ftang at cgg.com wrote:
> > > > > From: ftang at cgg.com
> > > > > To: nagios-users at lists.sourceforge.net
> > > > > Subject: [Nagios-users] Monitoring - 1000+ hosts and latency
> > > > > Date: Tue, 5 Aug 2003 12:23:25 +0100
> > > > >
> > > > > All,
> > > > >
> > > > > Is there anyone currenly using nagios (any version) on over
> > > > 1000 hosts
> > > > > (3000+ services)? I know that there have been emails about users
> > > > > experiencing extreme latencies, and you can imagine that I
> > > > too am in the
> > > > > same crowd. I like nagios (used netsaint in my last company
> > > > - only 15
> > > > > hosts), but I don''t think that it is suitable for my
> > > > current setup. I hope
> > > > > I am wrong.
> > > > >
> > > > > Thanks for your time.
> >
> >--
> >Regards,
> >David Parrish
> >0410 586 121
> ><< attach3 >>
>
> _________________________________________________________________
> MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.
> http://join.msn.com/?page=features/virus
--
Regards,
David Parrish
0410 586 121
-------------- next part --------------
diff -ur nagios-1.1-virgin/base/nagios.c nagios-1.1/base/nagios.c
--- nagios-1.1-virgin/base/nagios.c Thu Jul 10 08:30:10 2003
+++ nagios-1.1/base/nagios.c Thu Jul 10 08:32:20 2003
@@ -51,6 +51,8 @@
#include <getopt.h>
#endif
+#include <time.h>
+
/******** BEGIN EMBEDDED PERL INTERPRETER DECLARATIONS ********/
#ifdef EMBEDDEDPERL
@@ -1497,6 +1499,8 @@
printf("Current/Max Outstanding Checks: %d/%d\n",currently_running_service_checks,max_parallel_service_checks);
#endif
+ check_for_external_commands();
+
/* handle high priority events */
if(event_list_high!=NULL && (current_time>=event_list_high->run_time)){
@@ -1594,8 +1598,13 @@
}
/* wait a second so we don't hog the CPU... */
- else
- sleep((unsigned int)sleep_time);
+ else {
+ struct timespec t;
+ t.tv_sec = 0;
+ t.tv_nsec = 1000 * 1000 * 50;
+ nanosleep(&t, NULL);
+ //check_for_external_commands();
+ }
}
/* we don't have anything to do at this moment in time */
@@ -1606,7 +1615,12 @@
check_for_external_commands();
/* wait a second so we don't hog the CPU... */
- sleep((unsigned int)sleep_time);
+
+ { struct timespec t;
+ t.tv_sec = 0;
+ t.tv_nsec = 1000 * 1000 * 50;
+ nanosleep(&t, NULL); }
+
}
}
--- commands.c.old Sat Aug 2 14:13:23 2003
+++ nagios-1.1/base/commands.c Sat Aug 2 14:13:53 2003
@@ -347,8 +347,19 @@
/**** PROCESS ALL PASSIVE CHECK RESULTS AT ONE TIME ****/
- if(passive_check_result_list!=NULL)
- process_passive_service_checks();
+ {
+ static unsigned int last_checked = 0;
+ unsigned int t;
+
+ /* Don't process more frequently than once every 5 seconds. */
+ /* This does a fork!!! */
+ time(&t);
+ if(passive_check_result_list!=NULL && ((t - last_checked) > 5) ) {
+ process_passive_service_checks();
+ last_checked = t;
+ }
+ }
+
#ifdef DEBUG0
--- commands.c.old Sat Aug 2 15:37:31 2003
+++ nagios-1.1/base/commands.c Sat Aug 2 15:44:36 2003
@@ -63,7 +63,7 @@
extern FILE *command_file_fp;
-passive_check_result *passive_check_result_list;
+passive_check_result *passive_check_result_list = NULL;
int flush_pending_commands=FALSE;
@@ -96,9 +96,6 @@
/* update the status log with new program information */
update_program_status(FALSE);
- /* reset passive check result list pointer */
- passive_check_result_list=NULL;
-
/* reset flush flag */
flush_pending_commands=FALSE;
@@ -1279,13 +1276,9 @@
new_pcr->next=NULL;
- /* add the passive check result to the end of the list in memory */
- if(passive_check_result_list==NULL)
- passive_check_result_list=new_pcr;
- else{
- for(temp_pcr=passive_check_result_list;temp_pcr->next!=NULL;temp_pcr=temp_pcr->next);
- temp_pcr->next=new_pcr;
- }
+ /* add the passive check result to the head of the list in memory */
+ new_pcr->next = passive_check_result_list;
+ passive_check_result_list = new_pcr;
#ifdef DEBUG0
printf("cmd_process_service_check_result() end\n");
@@ -2845,6 +2838,17 @@
/* the grandchild process should submit the service check result... */
if(pid==0){
+ passive_check_result *t, *t1;
+ /* Reverse passive list to correct the reverse that happened in collection. */
+
+ t = passive_check_result_list;
+ passive_check_result_list = NULL;
+ while (t) {
+ t1 = t->next;
+ t->next = passive_check_result_list;
+ passive_check_result_list = t;
+ t = t1;
+ }
/* write all service checks to the IPC pipe for later processing by the grandparent */
for(temp_pcr=passive_check_result_list;temp_pcr!=NULL;temp_pcr=temp_pcr->next){
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20030811/04450675/attachment.sig>
More information about the Users
mailing list