State of the Practice in Algorithmic Debugging

State of the Practice in Algorithmic Debugging
A Guide to Implement an Algorithmic Debugger

Abstract
Algorithmic Debugging

Three empirical studies:	A comparative of the functionality of algorithmic debuggers
	An empirical evaluation of the efficiency of algorithmic debuggers
	A comparative of the performance of algorithmic debugging strategies

Abstract

Algorithmic debugging is a semi-automatic debugging technique which is based on the answers of an oracle (usually the programmer) to a series of questions generated automatically by the algorithmic debugger. The technique typically traverses a record of the execution—the so-called execution tree—which only captures the declarative aspects of the execution and hides operational details. In this work we perform two empirical studies: We overview and compare the most important algorithmic debuggers of different programming paradigms; and we evaluate all the algorithmic debugging strategies that have appeared in the literature. In the study we analyze the most important features incorporated by current algorithmic debuggers, and we identify some features not supported yet by any debugger. We then compare all the debuggers given rise to a map of the state of the practice in algorithmic debugging.

Algorithmic Debugging

In this work, we first identify a set of desirable features that an algorithmic debugger should implement, and then, we compare the most important algorithmic debuggers to study the state of the practice in algorithmic debugging. The comparison is done from two perspectives: functionality and efficiency.

Some useful links to algorithmic debugging works:

Original Definition
E. Shapiro. Algorithmic Program Debugging. MIT Press, 1982.

A Comparative Study of Algorithmic Debugging Strategies
J. Silva. Algorithmic Debugging Strategies. In Proc. of International Symposium on Logic-based Program Synthesis and Transformation (LOPSTR 2006), pages 134–140, 2006. (Download)

The debuggers we consider in our study are:

B.i.O.
B.i.O. (Believe in Oracles) is a debugger integrated into the Curry compiler Kics. This algorithmic debugger has the peculiarity that it does not need to store the ET.
B. Brassel. KiCS – The Kiel Curry System, 2008.

Buddha
Buddha is an algorithmic debugger for Haskell 98 programs.
B. Pope. Buddha: A declarative debugger for haskell, 1998.

DDT
DDT is part of the multi paradigm language TOY’s distribution. It uses a graphical user interface (GUI), implemented in Java (i.e., it needs the Java Runtime Environment to work).
R. Caballero. A Declarative Debugger of Incorrect Answers for Constraint Functional-Logic Programs. In Proc. of the 2005 ACM SIGPLAN Workshop on Curry and Functional Logic Programming (WCFLP’05), pages 8–13, New York, USA, 2005. ACM Press.

Freja
Freja is a debugger implemented in Haskell result of Henric Nilsson’s thesis. It is able to debug a subset of Haskell; unfortunately, it is not maintained anymore.
H. Nilsson. Declarative Debugging for Lazy Functional Languages. PhD thesis, Linköping, Sweden, May 1998.

Hat Delta
Hat-Delta belongs to Hat, a set of debugging tools for Haskell. In particular, it is the successor of the algorithmic debugger Hat-Detect.
T. Davie and O. Chitil. Hat-delta: One Right Does Make a Wrong. In Seventh Symposium on Trends in Functional Programming, TFP 06, April 2006.

Mercury
Mercury is a purely declarative logic and functional programming language. Its compiler has a debugger integrated with both a procedural debugger and an algorithmic debugger.
I. MacLarty. Practical Declarative Debugging of Mercury Programs. PhD thesis, Department of Computer Science and Software Engineering, The University of Melbourne, 2005.

Münster Curry Debugger
One of the most important compilers for Curry is the Münster Curry compiler which includes an algorithmic debugger.
W. Lux. Münster Curry user’s guide (release 0.9.10 of may 10, 2006). Available at: http://danae.uni-muenster.de/~lux/curry/user.pdf, 2006.

Nude
The NU-Prolog Debugging Environment (Nude) [15] integrates a collection of debugging tools for NUProlog
programs.
L. Naish, P. W. Dart, and J. Zobel. The NU-Prolog debugging environment. In A. Porto, editor, Proceedings of the Sixth International Conference on Logic Programming, pages 521–536, Lisboa, Portugal, June 1989.

Functionality Comparative

Table 1 presents a summary of the available functionalities of the studied algorithmic debuggers. We refer the reader to this_paper for a deeper explanation of the fields. Every column gathers the information of one algorithmic debugger. The meaning of the rows is the following:

Implementation Language: Language used to implement the debugger.
Target Language: Debugged language.
Strategies: Algorithmic debugging strategies supported by the debugger: Single Stepping (SS), Top-Down (TD), Heaviest First (HF), Divide & Query (DQ), Hirunkitti (HI), Hat Delta’s Heuristics (HD), Mercury’s Divide & Query (MD), and Subterm Dependency Tracking (SD).
DataBase/Memoization: Is a database used to store answers for future debugging sessions (inter-session
memory)? Are questions remembered during the same session (intra-session memory)?
Front-End: Is it integrated into the compiler or is it independent? Is it an interpreter/compiler or is it a program transformation?
Interface: Interface used between the front-end and the back-end. If the front-end is a program transformation, then it specifies the data structure generated by the transformed program. Whenever the data structure is stored into the file system, brackets specify the format used. Here, DDT exports the ET in two formats: XML or a TXT. Hat Delta uses an ART (Augmented Redex Trail) with a native format. Kiel’s debugger generates a list of step counts (see Scalability in Section 2.2 of this_paper) which is stored in a plain text file.
Execution Tree: When the back-end is executed, is the ET stored in the file system or in main memory? This row also specifies which debuggers produce the ET on demand.
Accepted Answers: Yes (YE), No (NO), Don’t Know (DK), Inadmissible (IN), Maybe Yes (MY), Maybe Not (MN), and Trusted (TR).
Tracing Subexpressions: Is it possible to specify that a (sub)expression is wrong?
ET Exploration: Is it possible to explore the ET freely?
Tree Compression: Does the debugger implements the tree compression technique?
Undo: Is it possible to undo an answer?
Trusting: Is it possible to trust modules (Mod), functions (Fun) and/or arguments (Arg)?
GUI: Has the debugger a graphical user interface?
Version: Evaluated version of the debugger.

Feature/Debugger	B.i.O.	Buddha	DDT	Freja	JDD	Hat Delta	Mercury Debugger	Münster Curry Debugger	Nue-Prolog
Implementation Language	Curry Haskell	Haskell	Toy (front-end) Java (back-end)	Haskell	Java	Haskell	Mercury	Haskell (front-end) Curry (back-end)	Prolog
Target Language	Curry	Haskell	Toy	Haskell subset	Java	Haskell	Mercury	Curry	Prolog
Strategies	TD	TD	TD DQ	TD	SS TD HF DQ HI HD	HD	TD DQ SD MD	TD	TD
DataBase / Memoization	NO/NO	NO/YES	NO/YES	NO/NO	YES/NO	NO/YES	NO/YES	NO/NO	YES/YES
Front-end	Independent. Prog. Tran.	Independent Prog. Tran.	Integrated Prog. Tran.	Integrated Compiler	Independent Compiler	Independent Prog. Tran.	Independent Compiler	Integrated Compiler	Independent Compiler
Interface	Steps count (Plain text)	ET	ET (XML/TXT)	ET	ET	ART (Native)	ET on Demand	ET	ET on Demand
Execution Tree	Main Memory on Demand	Main Memory	Main Memory	Main Memory on Demand	Main Memory	File System	Main Memory on Demand	Main Memory	Main Memory on Demand
Accepted answers?	YE NO DN	YE NO DN IN TR	YE NO DN TR	YE NO MY MN	YE NO DN TR	YE NO	YE NO DN IN TR	YE NO	YE NO
Tracing subexpressions?	NO	NO	NO	NO	NO	NO	YES	NO	NO
ET exploration?	YES	YES	YES	YES	YES	YES	YES	NO	NO
Tree compression?	NO	NO	NO	NO	YES	YES	NO	NO	NO
Undo	YES	NO	NO	YES	YES	NO	YES	NO	NO
Trusting	Mod/Fun/Arg	Mod/Fun	Fun	Mod/Fun	Fu	Mod	Mod/Fun	Mod/Fun	Fun
GUI	NO	NO	YES	NO	YES	NO	NO	NO	NO
Version	Kics 0.81893 (13.05.2008	1.2.1 (01.12.2006)	1.1 (2003)	1999-2000	2.1 (10.12.2008)	2.05 (22.10.2006)	Mercury 0.13.1 (01.12.2006)	0.9.10 (10.5.2006)	NU-Prolog 1.7.2 (13.07.2004)

Efficiency Comparative

We studied the growing rate of the internal data structure stored by the debuggers. This information is useful to know the scalability of each debugger and, in particular, their limitations w.r.t. the ET’s size. This study proved that several debuggers are not applicable to real programs because they are out of memory with medium-size programs.

In order to compare the growing rate of the ET’s size of each debugger, we selected some benchmarks from the nofib-buggy repository and designed other benchmarks particularly good for algorithmic debugging: Fibonacci, Factorial, Euler, Tak, Length, InSort, QuickSort, Mergesort, Primes and Queens. Then, we reprogrammed the benchmarks for all the targeted languages of the debuggers. Despite we selected the benchmarks taking into account that the reprogrammed versions were as similar as possible, some small differences appeared. However, the results are still valid because we ensured that the size of the data structures used in the questions is the same in all the programs; and because our experiment takes into account not only the size of the ET, but the relation size/number of nodes of the ET. Finally, we tested the debuggers with a set of increasing input data in order to be able to graphically represent the growing rate of the size of the ET.

Since each debugger handles the ET in a different way, we had to use a different method in each case to measure their size:

Hat Delta stores the ART into a file and traverses it during the debugging process. Then, we considered the size of the whole ART rather than the size of the implicit ET. Since the ART is used for other purposes than algorithmic debugging (e.g., tracing) it contains more information than needed.
DDT saves ETs with two formats: TXT and XML. For our purpose, we used the size of the XML because this is the file loaded by the tool to make the debugging process.
Münster Curry Debugger generates the ET in main memory. But we applied a patch—provided by Wolfgang Lux—that allows to store the ET in a text file.
Buddha also generates the whole ET in main memory. In this case, we used a shell script to measure the physical memory used by the data structures handled by the debugger. It might produce a slightly unfair advantage in that in-memory representation of the ET is likely to be more compact than any other representation stored on disk.

The experiments shows the size of the ET produced by four debuggers when debugging. X-axis represents the number of nodes in the ETs; Y-axis represents the ET's size in Kb; and Z-axis shows the debuggers.

It is important to note that we selected on purpose some benchmarks that process big data structures. The reason is that the size of the nodes is strongly dependent on the size of its data structures, and we wanted to study the growing rate with both small and big input data.

Benchmarks
Haskell	Download
Curry	Download
Toy	Download

Spreadsheet
HTML	View
Excel	Download
PDF	Download

Comparative Graphics
HTML	View
Excel	Download

A Comparative of Algorithmic Debugging Strategies

We conducted several experiments in order to empirically determine how algorithmic debugging strategies work in practice. For instance, the theoretical study (Download) concluded that Hirunkitti's version of D&Q is better than Shapiro's version. But the question is How much better? Similarly, after the theoretical study, we now know that Heaviest First is better than Top-Down; but, again, How much better? Is the difference significant in practice?

Basically, the experiments were done by debugging several programs with all the strategies in order to know which strategies work better in practice (as an average). The main problem was that no debugger implemented all the strategies---in fact, the most advanced debugger was Mercury's debugger which only implements four strategies---; and also that there was no maintained implementation of some strategies such as Divide by YES & Query.

Therefore, we had to implement all the strategies in order to ensure that all of them can be applied to the same target language, and hence the results are comparable. We implemented the strategies in the JDD debugger which is a Java debugger; and we selected 25 Java benchmarks to perform the experiments.

The results are shown in the following table, where rows represent benchmarks, and columns represent algorithmic debugging strategies. The strategies are the following: Top-Down (TD), Divide & Query by Shapiro (D&Q Sha), Divide & Query by Hirunkitti (D&Q Hir), Divide by YES \& Query (DbyY\&Q), Heaviest First (HF), Less YES First (LYF), Hat Delta YES (HD - Yes), Hat Delta NO (HD - No), Hat Delta Average (HD - Avg), Single Stepping (SS).

Comparative

The results of the comparison provide interesting information related to the strategies which confirm the theoretical study; but they also offer additional information that complements the theoretical study. Firstly, the experiments ratify that single stepping is not usable in practice. It needs much more questions than any other strategy. Top-Down, which is the most used strategy is neither a good option. For instance, in the experiments it needs 50% more questions than Divide & Query. The experiments confirm that hirunkitti's Divide & Query is better than Shapiro's Divide & Query; but the difference is very small. The idea of using rules to prune the tree revels to be very useful. Indeed, Divide by Rules and Query---which had not been implemented by any debugger---shows to be the best strategy in practice.
The other strategies got similar results. In the case of Hat Delta heuristics, we see that it is better to prune the tree by counting YES answers. This result confirms the same result obtained by Davie and Chitil when they compared the three Hat Delta heuristics.

The following figures graphically represent the data in the table:

Conclusions

The main conclusion of the study is that the state of the theory in algorithmic debugging is one step forward than the state of the practice. In general, algorithmic debuggers are maintained by researchers doing a thesis or as a part of their research in a university. We are not aware of any commercial algorithmic debugger.

As a result, researchers implement in their debuggers techniques developed in their university to prove their usefulness, but the general objective of the debuggers is not the debugging of real programs as it is proved by the fact that most of them do not have any mechanism to control the growing size of the ET (e.g., they store the whole ET in main memory).

The functionality comparison has produced a precise description of what features are implemented by any debugger (hence, for each language). From the description, it is easy to see that many features which should be implemented by any algorithmic debugger are only included in one of them. For instance, only the Mercury debugger is able to trace sub expressions, only Hat Delta implements the tree compression technique, only DDT has a GUI and only it allows the user to graphically explore the ET.

Another important conclusion is that none of the debuggers implements all the ET exploration strategies that appear in the literature. For instance, Hirunkitti's divide and query which is supposed to be the most efficient strategy has not been implemented yet.

Regarding the efficiency comparative, the main conclusion is that the main problem of current algorithmic debuggers is not time but space. Their scalability is very low because they are out of memory with long running computations due to the huge growing rate of the ET. In this respect, the more scalable debugger of the four we compared are Hat Delta (i.e., it supported more debugging sessions than the others) and Buddha. In fact, Hat Delta was able to debug all the programs in our experiment. However, the Münster Curry Debugger presents the lower ET's growing rate. It is surprising that much of the debuggers use a file to store the ET, and no debugger uses a database. The use of a database would probably solve the problem of the ET's size. Another technology that is not currently used and needs to be further investigated is the use of a clustering mechanism to allow the debugger exploring (and loading) only the part of the ET (the cluster of nodes) needed at each stage of the debugging session. Currently, no algorithmic debugger uses this technology that could both speed up the debugging session and increase the scalability of the debuggers.

Despite the big amount of problems we have identified, we can also conclude that there is a continuous progression in the development of algorithmic debuggers. This can be easily checked by comparing the dates of the releases studied and their implemented features. Also by comparing different versions of the same debuggers (for instance, from Hat Detect to Hat Delta many new functionalities have been added such as tree compression and new search strategies).