Possible Bugs in Nagios configfile parsing
Andreas Ericsson
ae at op5.se
Wed Oct 20 16:12:25 CEST 2004
Sascha Runschke wrote:
> Greetings,
>
> I just recently started migrating to SMS notifications and therefor
> testing escalations.
> I discovered several possible bugs in the configfiles parsing.
>
> I am using a CVS snapshot of nagios 2.0 dated from 08.10.2004.
> config.c hasn't been modified since then according to the CVS.
>
Would that be Aug 10 2004 or Oct 8 2004?
> Under certain circumstances nagios ends up in an endless loop in
> pre_flight_check().
>
> Everything is running fine, unless I put any kind of escalation into the
> config files.
>
This is an "extension" (in lack of a better word) to a known bug. I
discovered it myself yesterday with serviceescalations. It differs from
the 'original' infinite loop bug in that it doesn't matter how many
objects you have specified (previously it would only hang if only one
object of any type was specified, and it hung on that object type).
That bug was however located in common/object.c, which handles
populating all the object tables initially. Perhaps you will have better
luck if you look for it there.
> example:
>
> define hostescalation {
> host_name PDC01
> first_notification 1
> last_notification 1
> notification_interval 60
> contact_groups SMS-Alarm
> }
>
> Now running nagios -v nagios.cfg:
>
> [root at SRV00032 etc]# ../bin/nagioscheck
>
> Nagios 2.0a1
> Copyright (c) 1999-2004 Ethan Galstad (nagios at nagios.org)
> Last Modified: 11-18-2003
> License: GPL
>
> Reading configuration data...
>
> Running pre-flight check on configuration data...
>
> Checking services...
> Checked 253 services.
> Checking hosts...
> Warning: Host 'ABIT-DMZ_switch' has no services associated with it!
> Warning: Host 'RECHT.NET-DMZ_switch' has no services associated with it!
> Checked 106 hosts.
> Checking host groups...
> Checked 35 host groups.
> Checking service groups...
> Checked 0 service groups.
> Checking contacts...
> Checked 9 contacts.
> Checking contact groups...
> Checked 7 contact groups.
> Checking service escalations...
> Checked 0 service escalations.
> Checking service dependencies...
> Checked 0 service dependencies.
> Checking host escalations...
>
> Then nagios hangs with 99.9% cpu load.
>
> There is another interesting anomaly when I tried using escalations.
>
> I had a contactgroup that was unused called SMS-Test. When that
> contactgroup
> was activated nagios -v nagios.cfg outputs:
>
> [root at SRV00032 etc]# ../bin/nagioscheck
>
> Nagios 2.0a1
> Copyright (c) 1999-2004 Ethan Galstad (nagios at nagios.org)
> Last Modified: 11-18-2003
> License: GPL
>
> Reading configuration data...
>
> Running pre-flight check on configuration data...
>
> Checking services...
> Checked 253 services.
> Checking hosts...
> Warning: Host 'ABIT-DMZ_switch' has no services associated with it!
> Warning: Host 'RECHT.NET-DMZ_switch' has no services associated with it!
> Checked 106 hosts.
> Checking host groups...
> Checked 35 host groups.
> Checking service groups...
> Checked 0 service groups.
> Checking contacts...
> Checked 9 contacts.
> Checking contact groups...
>
> And nagios hangs again with 99.9% cpu load.
> It didn't even get to checking the host escalations, for some reason it
> already
> hangs in the contactgroups.cfg.
>
> This leads me to the conclusion that checking the references relating to
> contacts does
> have an error and can lead to a possible endless loop in
> pre_flight_check(). I took a
> quick look into config.c, but the problem didn't strike me yet.
> The code is quite... strange ;-)
>
Ethan has his peculiarities about indentation style. I suggest making
heavy use of the indent program to make it readable and fix the bug.
Then run it once more with whatever options needed to restore it,
followed by:
sed -i 's/\([\t ]*\)}/\1\t}/' some_file.c
(requirese sed version 4.0.9 or higher) to push the closing brackets to
where Ethan wants them.
> The problem is that I don't believe it doesn't work for anyone, because I
> never seen
> anyone mention it. Therefor some kind of circumstance I have must be
> provoking this
> problem.
>
> I'll try to put some more debugging output into config.c so I can see
> where exactly it hangs,
> I'm not in the mood for exhaustive gdb sessions...
>
./configure --enable-DEBUGALL
Run Nagios in the foreground and you'll get all the info you need.
I still think it's an error in common/object.c though.
> Since I am using cfg_dir directives in nagios.cfg for single cfg-files for
> each host, it's kinda
> complicated to post those to the list. Especially because publishing those
> to the public
> exposes all critical information for those systems acoording to internal
> IPs, services and
> purposes. And I don't feel like editing hundreds of files...
>
Let the computer do it for you.
#!/bin/sh
for dir in `grep ^cfg_dir nagios.cfg | sed 's/cfg_dir=//'`; do
echo "Working in $dir"
for f in $dir/*.cfg; do
echo " -- $f"
sed -e 's/.*address.*/ address localhost/g' \
-e 's/secret_stuff/XXXXXXXXXXX/g' \
$f > $f.pub
done
done
for f in `grep ^cfg_file= nagios.cfg | sed 's/cfg_file//'; do
echo " -- ${f##*/}"
sed -e 's/.*address.*/ address localhost/' \
-e 's/secret_stuff/XXXXXXXXXXX/g' $f > $f.pub \
done
tar cvzf bug-reproduction-config.tar.gz `find /usr/local/nagios -type f
-name "*.pub"`
Post the bug reproduction config on a server of your choice and put a
link somewhere where developers can find it.
> Thanks for reading that far ;)
>
> sash
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Lead Developer
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
More information about the Developers
mailing list