DIDUCE

Overview
DIDUCE is an anomaly detection system that helps in automated debug of Java programs. It instruments a program's bytecode and monitors values and types of objects in one or more executions of the program. It attempts to detect anomalies such as a new value for a variable at a program point. Such anomalies are ranked statistically on the basis of how large the deviation is from the past values seen at that point, and how often it was executed. Even with simple invariants, we found DIDUCE was surprisingly effective in debugging fairly complex bugs in 4 large systems we had no prior knowledge about.
Papers
Tracking down software bugs using automatic anomaly detection. [PDF]
Sudheendra Hangal and Monica S. Lam
ICSE '02: Proceedings of the 24th International Conference on Software Engineering (ICSE), 2002.
Software
Source code (released under GPL, not updated since March 2005).
Collaborators
Monica S. Lam
History
DIDUCE was initiated as a Stanford CS343 (Compiler Topics) course project circa 2000, and was inspired by Michael Ernst's dynamic invariants papers. Around this time, I was working at Sun on porting the HotSpot Java virtual machine to the MAJC architecture. HotSpot had several hundred thousand lines of code, and it was very hard to debug errors when it crashed, especially in parts of the system I did not understand. I was wishing for some form of automatic monitoring that would give me a clue when a piece of code had started seeing values different from what was "normal" for it. At the same time, I had built a system for another CS343 project to study compression of heap objects in the Java VM, and this system already had a nice bytecode instrumentation framework. I adapted this project to detect invariants and then anomalies. I was lucky to find a program at Sun that could use DIDUCE -- the simulator for the MAJC processor. Microprocessor simulators are notoriously hard to get right -- they run for a long time and its always hard to be certain that the results are correct. I started using DIDUCE on this simulator to initially debug a crash that a friend was struggling with. The problem was difficult to debug because it manifested itself only after several hours of execution. DIDUCE could not only find the root cause of the problem in this case, but also threw up a bunch of interesting anomalies that were undetected errors in the program (e.g., a cache line was being set to dirty but not valid.) After this experience, the value of the DIDUCE approach was pretty clear to me and I applied it on other projects and got a series of interesting results.
Learnings
Overall, we were surprised that our simple anomaly detection system worked so well. Though we initially thought that our simple notion of associating an invariant with a program point (as opposed to an object, or thread, etc) would quickly turn out to be inadequate, this was not the case. This may be because it fits well with programmers' natural tendency to associate invariants with specific locations in the program.

We found that we when things go wrong, they cause a series of anomalies, and catching a small number of these is sufficient for debugging. We also found that the approach of automatically and blindly detecting anomalies was extremely useful, because very often, the hard bugs are due to incorrect behaviour in places that are not obvious for the programmer to check.

Runtime overhead is obviously a concern with any system based on instrumentation, but turned out to be less of an issue than we had expected. DIDUCE typically slows down a program roughly by a factor of 2 to 10, depending on how compute-bound the program is. People often tolerate this overhead because: (a) machines are pretty fast anyway (b) they are using this tool to augment a manual process which is pretty slow anyway (c) Most programs are not purely compute bound, they spend their time in libraries, I/O, network, etc. and instrumentation has no impact on those operations. (d) we can parallelize the overhead by running the program on different machines with different modules instrumented. We tried to keep the overhead low by keeping our invariants space simple, so that the cost per heap access was a few simple arithmetic operations (though users can write code to track arbitrarily complex invariants). Instrumentation overhead can also be reduced by performing static analysis.

We found that our "difference" invariants (tracking the difference between the previous value of a memory location and its new value) were a novel feature that yielded interesting anomalies or just strong invariants. For example, in the MAJC simulator, a difference invariant pointed to a bug where a variable representing simulation time actually decreased occasionally, meaning that time was going backwards!

We tried to get real programmers to actually use DIDUCE in the course of their work, and learnt the obvious lessons in making a research tool usable for real - we had to have a GUI cross-linked to source code, so we built that. It had to be integrated with the IDE, so I tried to build a Netbeans plugin for it. It had to have zero startup cost for the user, so we introduced dynamic instrumentation at class load. Users asked for documentation (so I wrote a 25 page user manual which, of course, they did not read). Our little language for specifying and controlling instrumentation was unintuitive and did not get used much.