Problems with extensive passive monitoring
Mike Becher
Mike.Becher at lrz-muenchen.de
Mon Oct 9 14:46:08 CEST 2006
Hi all,
in our environment we got a problem with extensive passive monitoring
feature of nagios.
Description in short:
---------------------
In our environment we got more than 250 clients where each of them runs
its own nagios server to monitor itself. Each client runs up to 8 service
checks and posts these results as external command via use of
send_nsca/nsca to a master nagios server, I call it cluster master nagios
server or short CMNS.
This CMNS is also a client of one site master nagios server (or short
SMNS). CMNS must forward its messages to SMNS as external commands like
the clients did to it.
With build-in feature of nagios (we use version 2.5) you can use
send_nsca/nsca to forward messages from CMNS to SMNS too but this results
in:
* heavy load on CMNS due to fork of at least one external command
send_nsca to forward one message (in our environment up to 1000
forks per minute) to SMNS.
* up to 1000 nsca per minute to deliver external command messages from
clients to CMNS
* loosing of incomming messages from clients on CMNS because it reads
from external command pipe only 30 seconds .. then it makes a pause.
* child processes of CMNS become childs of `init' and all of them
write further into the pipe over which they are connected with the
nagios master process.
* thereby they eat a lot of memory so a machine with 512MB RAM and 2GB
swap must be booted after 2 days otherwise it hangs
The whole description can be read on:
http://www.mountcup.de/tiki/tiki-index.php?page=mibe-nagios-passive-monitoring
My solution
-----------
Instead of calling an external program (ocsp_command or ochp_command) for
each external command message to forward it from CMNS to SMNS let write
the nagios process these messages in a named pipe. The patch attached
gives you this functionallity for nagios version 2.5.
Then let a helper program read from this named pipe on CMNS site and let
it forward the messages through a (I call it here) channel to whatever you
want, in this case to SMNS. I have written a perl program that does this
for you which is added as attachment too.
What do you thing about the option to use namend pipes in addition to
ocsp_command and/or ochp_command running as external process?
The NDO interface can't be used in this case because there aren't any
connectors inside the code for external commands.
best regards
Mike
-----------------------------------------------------------------------------
Mike Becher Mike.Becher at lrz-muenchen.de
Leibniz-Rechenzentrum der http://www.lrz.de
Bayerischen Akademie der Wissenschaften phone: +49-89-35831-8721
Gruppe Hochleistungssysteme fax: +49-89-35831-9700
Boltzmannstrasse 1
D-85748 Garching bei Muenchen
Germany
-----------------------------------------------------------------------------
-------------- next part --------------
diff -u -r -N nagios-2.5/base/config.c nagios-mibe-2.5/base/config.c
--- nagios-2.5/base/config.c 2005-12-27 00:18:14.000000000 +0100
+++ nagios-mibe-2.5/base/config.c 2006-09-26 07:39:56.000000000 +0200
@@ -2770,6 +2770,14 @@
write_to_logs_and_console(temp_buffer,NSLOG_VERIFICATION_ERROR,TRUE);
errors++;
}
+ else {
+ if(verify_config==TRUE){
+ char raw_command_line[MAX_COMMAND_BUFFER];
+ printf(" ocsp_command is set to \"%s\"\n", temp_command->name);
+ get_raw_command_line(ocsp_command,raw_command_line,sizeof(raw_command_line),0);
+ printf(" and uses macro \"%s\"\n", raw_command_line);
+ }
+ }
}
if(ochp_command!=NULL){
@@ -2786,6 +2794,14 @@
write_to_logs_and_console(temp_buffer,NSLOG_VERIFICATION_ERROR,TRUE);
errors++;
}
+ else {
+ if(verify_config==TRUE){
+ char raw_command_line[MAX_COMMAND_BUFFER];
+ printf(" ochp_command is set to \"%s\"\n", temp_command->name);
+ get_raw_command_line(ochp_command,raw_command_line,sizeof(raw_command_line),0);
+ printf(" and uses macro \"%s\"\n", raw_command_line);
+ }
+ }
}
#ifdef DEBUG1
diff -u -r -N nagios-2.5/base/sehandlers.c nagios-mibe-2.5/base/sehandlers.c
--- nagios-2.5/base/sehandlers.c 2005-12-23 20:31:36.000000000 +0100
+++ nagios-mibe-2.5/base/sehandlers.c 2006-09-26 08:15:44.000000000 +0200
@@ -53,6 +53,45 @@
extern time_t program_start;
+static int my_npipe_fprintf(const char *pipe_name, const char *string_wo_newline){
+ struct stat st;
+ int nfd=-1;
+ FILE *npipe=NULL;
+
+ if(pipe_name == NULL)
+ return ERROR;
+ if(string_wo_newline == NULL)
+ return ERROR;
+
+
+ /* use existing FIFO if possible */
+ if((stat(pipe_name, &st) < 0) ||
+ ((st.st_mode & S_IFIFO) != S_IFIFO)){
+ /* create the external command file as a named pipe (FIFO) */
+ if(mkfifo(pipe_name, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP)!=0){
+ return ERROR;
+ }
+ }
+
+ /* open the command file for writing (non-blocked) - O_TRUNC flag cannot be
+ * used due to errors on some systems */
+ nfd = open(pipe_name, O_WRONLY|O_NONBLOCK, S_IWUSR|S_IWGRP);
+ if(nfd < 0){
+ return ERROR;
+ }
+ npipe = fdopen(nfd, "w");
+ if (npipe == NULL) {
+ close(nfd);
+ return ERROR;
+ }
+
+ /* write our data */
+ fprintf(npipe, "%s\n", string_wo_newline);
+
+ /* and close command pipe */
+ fclose(npipe);
+ return OK;
+}
/******************************************************************/
/************* OBSESSIVE COMPULSIVE HANDLER FUNCTIONS *************/
@@ -74,14 +113,17 @@
#endif
/* bail out if we shouldn't be obsessing */
- if(obsess_over_services==FALSE)
+ if(obsess_over_services==FALSE) {
return OK;
- if(svc->obsess_over_service==FALSE)
+ }
+ if(svc->obsess_over_service==FALSE) {
return OK;
+ }
/* if there is no valid command, exit */
- if(ocsp_command==NULL)
+ if(ocsp_command==NULL) {
return ERROR;
+ }
/* find the associated host */
temp_host=find_host(svc->host_name);
@@ -107,8 +149,40 @@
printf("\tProcessed obsessive compulsive service processor command line: %s\n",processed_command_line);
#endif
- /* run the command */
- my_system(processed_command_line,ocsp_timeout,&early_timeout,&exectime,NULL,0);
+ if (strncmp(ocsp_command,"namedpipe_ocsp_command:",strlen("namedpipe_ocsp_command:")) == 0){
+ /* put it into pipe */
+ char *npipe_path = strchr(ocsp_command, ':');
+ npipe_path++;
+ if (my_npipe_fprintf(npipe_path, processed_command_line) == ERROR) {
+#ifdef MIBE_DEBUG
+ snprintf(temp_buffer,sizeof(temp_buffer),
+ "npipe: sending of ocsp data skipped for ->%s<- because an error occured\n",
+ svc->host_name
+ );
+ temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRUE);
+ } else {
+ snprintf(temp_buffer,sizeof(temp_buffer),
+ "npipe: sending of ocsp data done for ->%s<-\n",
+ svc->host_name
+ );
+ temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRUE);
+#endif
+ }
+ } else {
+ /* run the command */
+#ifdef MIBE_DEBUG
+ snprintf(temp_buffer,sizeof(temp_buffer),
+ "npipe: running ocsp_command ->%s<- for ->%s<-\n",
+ processed_command_line,
+ svc->host_name
+ );
+ temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRUE);
+#endif
+ my_system(processed_command_line,ocsp_timeout,&early_timeout,&exectime,NULL,0);
+ }
/* check to see if the command timed out */
if(early_timeout==TRUE){
@@ -140,14 +214,17 @@
#endif
/* bail out if we shouldn't be obsessing */
- if(obsess_over_hosts==FALSE)
+ if(obsess_over_hosts==FALSE){
return OK;
- if(hst->obsess_over_host==FALSE)
+ }
+ if(hst->obsess_over_host==FALSE){
return OK;
+ }
/* if there is no valid command, exit */
- if(ochp_command==NULL)
+ if(ochp_command==NULL){
return ERROR;
+ }
/* update macros */
clear_volatile_macros();
@@ -169,8 +246,39 @@
printf("\tProcessed obsessive compulsive host processor command line: %s\n",processed_command_line);
#endif
- /* run the command */
- my_system(processed_command_line,ochp_timeout,&early_timeout,&exectime,NULL,0);
+ if (strncmp(ochp_command,"namedpipe_ochp_command:",strlen("namedpipe_ochp_command:")) == 0){
+ /* put it into pipe */
+ char *npipe_path = strchr(ochp_command, ':') + 1;
+ if (my_npipe_fprintf(npipe_path, processed_command_line) == ERROR) {
+#ifdef MIBE_DEBUG
+ snprintf(temp_buffer,sizeof(temp_buffer),
+ "npipe: sending of ochp data skipped for ->%s<- because an error occured\n",
+ hst->name
+ );
+ temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRUE);
+ } else {
+ snprintf(temp_buffer,sizeof(temp_buffer),
+ "npipe: sending of ocsp data done for ->%s<-\n",
+ hst->name
+ );
+ temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRUE);
+#endif
+ }
+ } else {
+ /* run the command */
+#ifdef MIBE_DEBUG
+ snprintf(temp_buffer,sizeof(temp_buffer),
+ "npipe: running ochp_command ->%s<- for ->%s<-\n",
+ processed_command_line,
+ hst->name
+ );
+ temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRUE);
+#endif
+ my_system(processed_command_line,ochp_timeout,&early_timeout,&exectime,NULL,0);
+ }
/* check to see if the command timed out */
if(early_timeout==TRUE){
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fwd_nagios_results.pl.gz
Type: application/octet-stream
Size: 6576 bytes
Desc: fwd_nagios_results.pl.gz
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061009/e12444ce/attachment.obj>
-------------- next part --------------
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel
More information about the Developers
mailing list