Bug in process_macros() (most likely) (was: Antwort: Strange empty entries with utime 0 in showlog.cgi)

Sascha.Runschke at gfkl.com Sascha.Runschke at gfkl.com
Fri Jul 11 14:58:19 CEST 2008


nagios-users-bounces at lists.sourceforge.net schrieb am 09.07.2008 12:04:01:

> after debugging some retry problems, I stumbled over strange entries
> in the showlog.cgi (see attached 
> screenshot). This is the daily log. At 00:00 it rotates and logs all
> initial states of the machines/services - 
> even though I disabled it in the nagios.cfg. Though I do not really 
> mind that. The strange thing are the 
> empty entries of 01.01.1970 _after_ the initial states, then some 
> current entries, then again 01.01.1970, then 
> again current states, and then nagios logs ALL the initial states 
_again_. 
> 
> Not that it's causing any obvious problems, but it looks kinda 
> strange in my opinion. 
> Does anyone have an idea what may cause this behaviour? 

Self-replies rock. ;)
I'll CC this to nagios-devel since it's a bug. I'm not yet sure
where it actually comes from, but I guess it's inside process_macros().
Still trying to pin it down exactly...

Back to my initial problem (see above).
After some debugging I found the culprit. It's my service definitions
containing quoted $ signs. Nagios stumbles over them and logs nonsense
in the current state logging after log rotating and everytime else when
it processes macros for this server (see below).

This is an excerpt from nagios.log (notice the lines with single $ on 
them):

[1215727200] CURRENT SERVICE STATE: esskhk01;MSSQLSERVER;OK;HARD;1;OK: 
MSSQLSERVER: started
[1215727200] CURRENT SERVICE STATE: esskhk01;MSSQL_PERBIT;OK;HARD;1;OK: 
MSSQL$PERBIT: started
$
[1215727200] CURRENT SERVICE STATE: esskhk01;MSSQL_PERSONAL;OK;HARD;1;OK: 
MSSQL$PERSONAL: started
$
[1215727200] CURRENT SERVICE STATE: esskhk01;NSClient_Version;OK;HARD;1;I 
(0.3.0.4 2007-12-04) seem to be doing fine...
[1215727200] CURRENT SERVICE STATE: esskhk01;SQLAgent_PERBIT;OK;HARD;1;OK: 
SQLAgent$PERBIT: started
$
[1215727200] CURRENT SERVICE STATE: 
esskhk01;SQLAgent_PERSONAL;OK;HARD;1;OK: SQLAgent$PERSONAL: started
$
[1215727200] CURRENT SERVICE STATE: esskhk01;SQLSERVERAGENT;OK;HARD;1;OK: 
SQLSERVERAGENT: started

Those "$" lines show up as "[01-01-1970 01:00:00] " with empty info on 
them in the showlog.cgi (see below).

My service definition for example:

define service {
        service_description             MSSQL_PERBIT
        use                             generic-service-noperf
        host_name                       esskhk01
        check_command check_nrpe_services!MSSQL"$$"PERBIT
}

Debugging shows (debug_level=-1, debug_verbosity=2):

Check gets scheduled and nagios parses the returnvalue. Obviously there is 
a \n that does not belong there.
Not yet sure why it shows up there...

[1215777431.127292] [001.0] [pid=3539] handle_async_service_check_result()
[1215777431.127299] [016.0] [pid=3539] ** Handling check result for 
service 'MSSQL_PERBIT' on host 'esskhk01'...
[1215777431.127305] [016.1] [pid=3539] HOST: esskhk01, SERVICE: 
MSSQL_PERBIT, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHE
DULE: Yes, EXITED OK: Yes, RETURN CODE: 3, OUTPUT: Service: MSSQL$PERBI 
caused: GetServiceKeyName: Could not translate service name(
997)\n
[1215777431.127320] [016.1] [pid=3539] Service is in a non-OK state!

- Shortly afterwards the macros for the alert get processed.
- The fprintf parses the \n and the log shows a newline now.
- Afterwards process_macros() parses the string and tries to resolve 
everything after the $ in the service
  definition.
- It fails to do so for obvious reasons, but then it seems to add an extra 
$ at the end of the buffer,
  maybe while trying to fix up a "broken macro"?
- Because the input buffer does not seem to be stripped yet, it adds an 
extra $ _after_ the \n
- strip(buffer) doesn't work anymore, as it only strips \n and " " when 
trailing and heading.

That's why I see single "$" entries in my logfile - which the showlog.cgi 
wrong parses as
"[01-01-1970 01:00:00] "
Notice the buffer before processing (no $ at the end) and after processing 
(with $ at the end - after the \n).

[1215777431.129970] [2048.1] [pid=3539] **** BEGIN MACRO PROCESSING 
***********
[1215777431.129978] [2048.1] [pid=3539] Processing: 'SERVICE ALERT: 
esskhk01;MSSQL_PERBIT;$SERVICESTATE$;$SERVICESTATETYPE$;$SERVICE
ATTEMPT$;Service: MSSQL$PERBI caused: GetServiceKeyName: Could not 
translate service name(997)
'
[1215777431.130001] [2048.0] [pid=3539]  WARNING: Could not find a macro 
matching 'PERBI caused'!
[1215777431.130010] [2048.0] [pid=3539]  WARNING: An error occurred 
processing macro 'PERBI caused: GetServiceKeyName: Could not tra
nslate service name(997)
'!
[1215777431.130018] [2048.1] [pid=3539]   Done.  Final output: 'SERVICE 
ALERT: esskhk01;MSSQL_PERBIT;UNKNOWN;SOFT;1;Service: MSSQL$P
ERBI caused: GetServiceKeyName: Could not translate service name(997)
$'
[1215777431.130025] [2048.1] [pid=3539] **** END MACRO PROCESSING 
*************


Sadly process_macros() is really ... uhm ... ugly ;) I haven't been able 
to find the problem yet.
Maybe someone with more insight in common/macros.c can have a look? I'm 
more then happy to assist though.

Regards
        Sascha

PS: Nagios 3.0.3 of course.

-- 
Sascha Runschke
Netzwerk-  und  Systemmanagement
Telefon : +49 (201) 102-1879 Mobil : +49 (173) 5419665 Fax : +49 (201) 
102-1102105



GFKL Financial Services AG
Vorstand: Dr. Peter Jänsch (Vors.), Jürgen Baltes, Dr. Till Ergenzinger, Dr. Tom Haverkamp
Vorsitzender des Aufsichtsrats: Dr. Georg F. Thoma
Sitz: Limbecker Platz 1, 45127 Essen, Amtsgericht Essen, HRB 13522
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20080711/7b96f318/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Developers mailing list