TSOtool is a program to find complex and subtle bugs in the design of shared memory multiprocessors. TSOtool generates pseudo-random multi-threaded programs with aggressive data races and synchronization operations, runs them on a multiprocessor and checks if the program result is correct according to the machine's memory model. The power of TSOtool is in being an end-to-end checker for the computer's memory system: over the years, it has been used on many different microprocessors and server designs at Sun Microsystems and has exposed bugs in every part of the system: pipelines, caches, load-store units, coherency protocols, atomicity and transactional memory microarchitectures, DRAM controllers, and even software emulation routines. TSOtool can be deployed on real hardware running unmodified operating systems like Solaris, as well as in pre-silicon simulation environments. See the presentation above for a 5 minute overview, our ISCA-04 paper for a description of the overall approach, and Sit Manovit's PhD dissertation for an excellent summary of the entire project.

Our SPAA-05 paper describes algorithmic optimizations to verify memory consistency using the idea of offline vector clocks, which led to a dramatic improvement in analysis speed.

In our HPCA-06 paper, we presented a study of complete algorithms for verifying memory consistency. We showed the result that, even though the problem is known to be NP-complete for popular memory models such as SC and TSO, a simple backtracking based heuristic works well in typical executions, as long as it employs constraint inference during search. Our complete algorithm runs only marginally slower than an incomplete algorithm that may miss errors.

At PACT-06, we presented an extension of TSOtool to the important problem of verifying transactional memory (TM) systems. While there is much research on TM, there has been little work so far on how to practically build and verify these systems. We formulate an axiomatic framework to specify TM systems similar to the ones for memory consistency, and show how a straightforward extension of our verification algorithm can be applied to TM. We used this approach to successfully find some bugs in the Stanford "all transactions all the time" architecture called TCC. (PACT-06 paper)

See Publications page. We should have probably written up a journal article summarizing the entire project, but Sit's dissertation serves as a replacement.

Sit Manovit, Durgam Vahia, Sridhar Narayanan, Alex Gert, Gopal Reddy and many others.

In 1999-2000, at the height of the dot-com boom, Sun had many problems shipping the new UltraSPARC-III (Cheetah) based systems. While we booted uniprocessor Solaris within a week of getting silicon, it took 18 months after tapeout to ship the first production workstation. The delay was primarily due to deep corner case bugs we kept finding in multiprocessing functionality (even the most basic workstation had to support a 2-CPU configuration). Each bug was very hard to debug as it would manifest itself in an unpredictable application or OS crash after the system had been running for several hours.

After hearing about a series of these bug reports, I started looking at how we could aggressively test multiprocessing functionality and expose bugs faster. An obvious idea was to extend the common practice of psuedo-random testing to the MP domain. The problem was to figure out how to reason about the results of programs with races when multiple outcomes may be correct. I wrote a little pseudo random MP test generator as a skunkworks project for a while and then Durgam Vahia and I started working in mid-2001 on a graph based reasoning approach to check correctness of executions. We were incorporating various rules arbitrarily into the tool, when we chanced upon the Xerox PARC paper by Pradeep Sindhu et al which formally laid out TSO semantics as axioms. We immediately saw that we could elegantly formulate all our rules as part of this axiomatic framework.

Chaiyasit (Sit) Manovit joined our team in mid-2002, initially as an intern from Stanford and later as an employee. In 2003, we discovered the work by Phil Gibbons et al on the complexity of verifying sequential consistency, and adapting it to TSO was straightforward; it also helped us come up with an example to prove that our analysis algorithm wasn't complete (which we were unsure about till this point). Since our test throughput on real systems was limited by analysis speed, Sit and I put in several optimizations for pruning graph updates. We initially used the concept of frontiers to prune our graph search when it would no longer infer new relations; Sit formulated this in terms of (offline) vector clocks. Once we had got used to performing these kinds of optimizations, it was relatively easy to create a search algorithm for write order that would ensure the algorithm would be complete and yet efficient in the common case. The final phase of the TSOtool work, extending it to transactional memory systems, was motivated by Sit's interactions with the Stanford TCC group, which was led by his advisor Kunle Olukotun.

Our TSOtool team put in a lot of work to get it adopted by the nearly 10 or so processor/system design groups in Sun, in various enviroments such as RTL simulation, hardware accelerators and real hardware. Educating and convincing the various design groups was tirelessly driven by Sridhar Narayanan and Shrenik Mehta and by the internal champions in every group.

The TSOtool team consisting of myself, Durgam Vahia and Sit Manovit was awarded Scott Mcnealy's Chairman's Award for Innovation (though of course a lot of other people contributed to its success).

Challenges and Future work (Possible PhD topics for interested students)
One of the ideas that I pursued for a while, but did not complete, was "TSOtool Online" - the idea was to generate stylized multiprocessor test programs in which correctness of results could be checked online, as the program was running. This would dramatically increase test throughput on real hardware by doing away with the need to store results, load them into analysis, build heavyweight graph-based inference structures, etc. In return, we would lose our ability to catch a small number of real errors. Given our experiences, we knew that a reasonably large fraction of errors could be caught with a small number of checks. This is an interesting tradeoff to study.

Another idea we had begun to explore was that of multiprocessor reproducibility, motivated by the nightmarish experiences we had at Sun in reproducing failures on MP hardware (those who debugged the infamous E-to-M state bug for six months on the Jalapeno/US-3i processor will never forget it!) Unlike other research proposals, we could not require modifying hardware at all, since that was impractical for existing or in-design processors. Our idea was to introduce lightweight instrumentation at the instruction level into a TSOtool generated test to observe relative positions of different threads (at programmable intervals) as the test was executing. If the test triggered a failure, we would use those observations to insert sync points in each thread so as to gently guide a subsequent execution into mimicing the original execution. I named this project POTTER (Program Observation Technology To Enhance Reproducibility) because the idea is to gently nudge the program execution at its corners towards the desired outcome - like a real potter, two outcomes are never guaranteed to be identical, though the potter makes a best effort.

The underlying thesis of POTTER was that just as insertion of a few characters in a Word document usually does not ripple through to cause the entire document to be reformatted, perturbing the program slightly would get absorbed in white spaces like cache misses and allow us a fair shot at reproducing MP executions. I had an intern, Hemant Gupta, work in summer 2004 on implementing POTTER, and we studied some of its reproducibility and perturbation characteristics. However, we never got a chance to deploy it on real bugs in Sun.

Another area for potential study is to come up with analysis algorithms when the values written by all stores are not unique. While this is not an issue for tests generated by TSOtool, removing this requirement from the test program allows the analysis to check correctness of arbitrary programs. There is of course a tradeoff between the ability to derive constraints and thereby find bugs, and the amount of reuse of store values.

While most of our published work focuses on analysis algorithms, a practical challenge in deploying TSOtool-like approaches is effective test-case generation. We had to try many combinations of the input weights to the pseudo-random random generator to aggressively test a system and to reduce time to failure when we did have a bug. Using and tuning these weights for most pseudo-random generators (not just TSOtool) remains an art and more work is needed in tackling this problem systematically.

Beyond multiprocessors, approaches like TSOtool can be applied to other platforms that provide consistency or atomicity guarantees, such as the Java virtual machine, the C++ memory model and transactional database engines.

Related projects
Intel has developed a program for memory consistency verification based on our approach (CAV-2006 paper.)

TSOtool is a proprietary program of Sun Microsystems. The analysis engine can be used independently of the SPARC code generator, and is relatively easy to modify for other memory models, particularly SC. For licensing opportunities, please contact Durgam.Vahia (at) oracle.com.