| Abstract Algorithmic Debugging |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Abstract | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Algorithmic debugging is a semi-automatic debugging technique
which is based on the answers of an oracle (usually
the programmer) to a series of questions generated automatically
by the algorithmic debugger. The technique typically
traverses a record of the execution—the so-called execution
tree—which only captures the declarative aspects of
the execution and hides operational details. In this work we
perform two empirical studies: We overview and compare the most important algorithmic debuggers
of different programming paradigms; and we evaluate all the algorithmic debugging strategies that have appeared in the literature. In the study
we analyze the most important features incorporated by
current algorithmic debuggers, and we identify some features
not supported yet by any debugger. We then compare
all the debuggers given rise to a map of the state of the
practice in algorithmic debugging. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Algorithmic Debugging | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In this work, we first identify a set of desirable features that an algorithmic debugger should implement, and then, we compare the most important algorithmic debuggers to study the state of the practice in algorithmic debugging. The comparison is done from two perspectives: functionality and efficiency. Some useful links to algorithmic debugging works: Original Definition A Comparative Study of Algorithmic Debugging Strategies The debuggers we consider in our study are: B.i.O.
Buddha DDT Freja Hat Delta Mercury Münster Curry Debugger Nude |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Functionality Comparative | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Table 1 presents a summary of the available functionalities of the studied algorithmic debuggers. We refer the reader to this_paper for a deeper explanation of the fields. Every column gathers the information of one algorithmic debugger. The meaning of the rows is the following:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Efficiency Comparative | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
We studied the growing rate of the internal data structure stored by the debuggers. This information is useful to know the scalability of each debugger and, in particular, their limitations w.r.t. the ET’s size. This study proved that several debuggers are not applicable to real programs because they are out of memory with medium-size programs. In order to compare the growing rate of the ET’s size of each debugger, we selected some benchmarks from the nofib-buggy repository and designed other benchmarks particularly good for algorithmic debugging: Fibonacci, Factorial, Euler, Tak, Length, InSort, QuickSort, Mergesort, Primes and Queens. Then, we reprogrammed the benchmarks for all the targeted languages of the debuggers. Despite we selected the benchmarks taking into account that the reprogrammed versions were as similar as possible, some small differences appeared. However, the results are still valid because we ensured that the size of the data structures used in the questions is the same in all the programs; and because our experiment takes into account not only the size of the ET, but the relation size/number of nodes of the ET. Finally, we tested the debuggers with a set of increasing input data in order to be able to graphically represent the growing rate of the size of the ET. Since each debugger handles the ET in a different way, we had to use a different method in each case to measure their size:
The experiments shows the size of the ET produced by four debuggers when debugging. X-axis represents the number of nodes in the ETs; Y-axis represents the ET's size in Kb; and Z-axis shows the debuggers. It is important to note that we selected on purpose some benchmarks that process big data structures. The reason is that the size of the nodes is strongly dependent on the size of its data structures, and we wanted to study the growing rate with both small and big input data.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A Comparative of Algorithmic Debugging Strategies | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
We conducted several experiments in order to empirically determine how algorithmic debugging strategies work in practice. For instance, the theoretical study (Download) concluded that Hirunkitti's version of D&Q is better than Shapiro's version. But the question is How much better? Similarly, after the theoretical study, we now know that Heaviest First is better than Top-Down; but, again, How much better? Is the difference significant in practice? Basically, the experiments were done by debugging several programs with all the strategies in order to know which strategies work better in practice (as an average). The main problem was that no debugger implemented all the strategies---in fact, the most advanced debugger was Mercury's debugger which only implements four strategies---; and also that there was no maintained implementation of some strategies such as Divide by YES & Query. Therefore, we had to implement all the strategies in order to ensure that all of them can be applied to the same target language, and hence the results are comparable. We implemented the strategies in the JDD debugger which is a Java debugger; and we selected 25 Java benchmarks to perform the experiments. The results are shown in the following table, where rows represent benchmarks, and columns represent algorithmic debugging strategies. The strategies are the following: Top-Down (TD), Divide & Query by Shapiro (D&Q Sha), Divide & Query by Hirunkitti (D&Q Hir), Divide by YES \& Query (DbyY\&Q), Heaviest First (HF), Less YES First (LYF), Hat Delta YES (HD - Yes), Hat Delta NO (HD - No), Hat Delta Average (HD - Avg), Single Stepping (SS).
The results of the comparison provide interesting information related to the strategies which confirm the theoretical study; but they also offer additional information that complements the theoretical study. Firstly, the experiments ratify that single stepping is not usable in practice. It needs much more questions than any other strategy. Top-Down, which is the most used strategy is neither a good option. For instance, in the experiments it needs 50% more questions than Divide & Query. The experiments confirm that hirunkitti's Divide & Query is better than Shapiro's Divide & Query; but the difference is very small. The idea of using rules to prune the tree revels to be very useful. Indeed, Divide by Rules and Query---which had not been implemented by any debugger---shows to be the best strategy in practice. The following figures graphically represent the data in the table:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Conclusions | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The main conclusion of the study is that the state of the theory in algorithmic debugging is one step forward than the state of the practice. In general, algorithmic debuggers are maintained by researchers doing a thesis or as a part of their research in a university. We are not aware of any commercial algorithmic debugger. As a result, researchers implement in their debuggers techniques developed in their university to prove their usefulness, but the general objective of the debuggers is not the debugging of real programs as it is proved by the fact that most of them do not have any mechanism to control the growing size of the ET (e.g., they store the whole ET in main memory). The functionality comparison has produced a precise description of what features are implemented by any debugger (hence, for each language). From the description, it is easy to see that many features which should be implemented by any algorithmic debugger are only included in one of them. For instance, only the Mercury debugger is able to trace sub expressions, only Hat Delta implements the tree compression technique, only DDT has a GUI and only it allows the user to graphically explore the ET. Another important conclusion is that none of the debuggers implements all the ET exploration strategies that appear in the literature. For instance, Hirunkitti's divide and query which is supposed to be the most efficient strategy has not been implemented yet. Regarding the efficiency comparative, the main conclusion is that the main problem of current algorithmic debuggers is not time but space. Their scalability is very low because they are out of memory with long running computations due to the huge growing rate of the ET. In this respect, the more scalable debugger of the four we compared are Hat Delta (i.e., it supported more debugging sessions than the others) and Buddha. In fact, Hat Delta was able to debug all the programs in our experiment. However, the Münster Curry Debugger presents the lower ET's growing rate. It is surprising that much of the debuggers use a file to store the ET, and no debugger uses a database. The use of a database would probably solve the problem of the ET's size. Another technology that is not currently used and needs to be further investigated is the use of a clustering mechanism to allow the debugger exploring (and loading) only the part of the ET (the cluster of nodes) needed at each stage of the debugging session. Currently, no algorithmic debugger uses this technology that could both speed up the debugging session and increase the scalability of the debuggers. Despite the big amount of problems we have identified, we can also conclude that there is a continuous progression in the development of algorithmic debuggers. This can be easily checked by comparing the dates of the releases studied and their implemented features. Also by comparing different versions of the same debuggers (for instance, from Hat Detect to Hat Delta many new functionalities have been added such as tree compression and new search strategies). |