Symbolic execution techniques identify vulnerabilities in safety-critical code
StoryMarch 07, 2012
Multicore processors are becoming increasingly popular in safety-critical applications because they offer significant price and performance improvements. However, writing multithreaded applications for multicore hardware is notoriously difficult and could result in catastrophic failures. The following describes symbolic execution techniques for identifying issues including data races ? one of the most common concurrency defects ? and how static analysis can help developers find and eliminate them.
Maximizing performance is especially important for military embedded systems because of the growing need to keep costs low while satisfying the requirements of connectivity in an increasingly digital battlefield. As manufacturers reach the limits of what can be wrung from increased miniaturization and integration, the best approach to increased performance is the use of multicore processors.
The downside is that to take full advantage of many cores executing in parallel, the software must be written to be intrinsically multithreaded. Software written to be single-threaded for a single core processor will realize little or no performance benefit when executed on a multicore processor: It must be rewritten or adapted to use multithreading. The key challenge is to keep the cores busy as much as possible, while ensuring that they coordinate access to shared resources properly. Unfortunately writing such code is much harder than writing single-threaded code. When there are defects such as deadlocks or race conditions, they can manifest in ways that are difficult to diagnose. Traditional techniques for finding and eliminating concurrency bugs may be ineffective.
One of the core reasons why concurrency bugs are so difficult is because there is an enormous number of ways in which the events in the threads can be interleaved when those threads execute. As the number of threads or instructions increases, the number of interleavings increases exponentially. If thread A executes M instructions and thread B executes N instructions, there are N+MCN possible interleavings of the two threads. For example, given two trivial threads with 10 instructions each, there are 184,756 possible interleavings of those instructions. Even with very small programs it is clear that it is next to impossible to test all possible combinations. Secondly, even if it is possible to identify a single interleaving that leads to a failure, it can be very difficult to set up a repeatable test case that uses that particular interleaving because scheduling of threads is effectively nondeterministic. Consequently, debugging concurrent programs can be very expensive and time consuming. A race condition is a class of concurrency defect that is easy to accidentally introduce and difficult to eliminate with conventional testing. However, there are techniques programmers can use to find and remove them.
Potential catastrophic failures
Compared to single-threaded code, entirely new classes of defect can occur in concurrent programs, including deadlock, starvation, and race conditions. Such defects mostly cause mysterious failures during development that are very difficult to diagnose and eliminate. One avionics manufacturer we have worked with spent two person-years applying traditional debugging techniques in an effort to find the root cause of an intermittent software failure that turned out to be a race condition. Sometimes the consequences can be dire – two of the most infamous software failures ever were caused by race conditions. The Therac-25 radiation therapy machine featured a race condition that was responsible for the deaths of several patients[2]. Similarly, the 2003 Northeast blackout was exacerbated by a race condition that resulted in misleading information being communicated to the technicians[3].
There are several different kinds of race conditions. One of the most common and insidious forms – data races – is the class of race conditions involving access to memory locations.
A data race occurs when there are two or more threads of execution that access a shared memory location, at least one thread is changing the data at that location, and there is no explicit mechanism for coordinating access. If a data race occurs it can leave the program in an inconsistent state.
Consider avionics code that controls the position of a flap. In normal circumstances the flap is in a position dictated by the flight control software, but the pilot can override that position by pressing a button on his control panel, in which case a manually set position is used. To keep things simple, let’s say that there are two threads in the program: one that controls the flap and one that monitors the position of the elements on the control panel. There is also a shared Boolean variable, named is_manual, that encodes whether the manual override is set or not. The flap position thread checks the value of is_manual, and if true, it sets the position accordingly. The control panel thread listens for button press events, and if the override button is pressed, it sets is_manual to true. Figure 1 shows the code that one might write to implement this specification. This code is likely to work most of the time; however, because the is_manual variable encodes a state that is shared by both threads, it is vulnerable to a data race because access to it is not protected by a lock. If the flap positioning code is being executed at the exact time that the pilot hits the override button, then the program may enter an inconsistent state and the wrong flap position will be used. Figure 2 shows how this might happen.
Figure 1: Code in two threads that access a shared variable
(Click graphic to zoom by 1.9x)
Figure 2: An interleaving of instructions that causes a data race
(Click graphic to zoom by 1.9x)
This example neatly illustrates one of the properties of data races that makes them hard to diagnose: The symptom of corruption may only be observable long after the data race has occurred. In this case, the fact that the wrong flap position is being used may only be noticed when the pilot notices the aircraft is not responding as expected.
A widely held belief is that some instances of data races are benign and can be tolerated. However, it is now clear beyond doubt that this is only rarely true. The C standard[4] states unambiguously that compilers are allowed to assume that there are no data races, so optimizers can and do make transformations that are valid for improving the performance of single-threaded code but which introduce bugs when there are apparently benign race conditions. These are subtle effects – even experienced programmers are regularly surprised by them. (See reference [1] for a full explanation and several compelling examples.) Because of this, to achieve high levels of assurance and avoid disastrous failures, it is very important to find and remove all data races.
Eliminating concurrency defects
Given that concurrency defects, and data races in particular, are so risky, it is important to use multiple techniques to eliminate them. Traditional dynamic testing is not well suited for finding many concurrency defects because of non-determinism. A program that passes a test hundreds of times may later fail in the same environment with exactly the same inputs because the bug can be exquisitely sensitive to timing. Engineers looking for high assurance must turn to other techniques if they are to eliminate concurrency defects.
Static analysis tools offer a means for finding such bugs. The key difference between testing and static analysis is that it tests a particular execution of a program for a given set of inputs, whereas static analysis finds properties that are good for all possible executions and all inputs. (In practice, static analysis tools make approximations to achieve acceptable performance and precision, so fall short of this ideal model. Nevertheless, they do cover many more cases than would ever be possible with traditional testing.)
Roughly speaking, static analysis tools work by creating a model of the program and by doing a symbolic execution of that model, looking for error conditions along the way. For example, GrammaTech’s CodeSonar static analysis tool finds data races by creating a map of which locks are held by which threads and by reasoning about the possible interleavings that could result in unsynchronized access to shared variables. Deadlock and other concurrency defects (including lock mismanagement) are found using similar techniques.
Custom concurrency constructs: A case study
Standard defect detection techniques are most useful when programs use standard ways of managing concurrency. Most tools recognize and can reason about the special properties of standard libraries such as the POSIX threads library or proprietary interfaces such as VxWorks. However, many systems use custom techniques for managing concurrency.
For example, another manufacturer we worked with built a safety-critical device on a platform that used a custom pre-emptive multithreaded software interface. In this design, a key constraint was that all data instances that could be accessed from multiple priority levels of threads had to be protected with proper guard constructs. Prior to using static analysis, validating that this constraint was respected required a person-month of manual analysis. To reduce the cost, they sought a solution by turning to static analysis. An important property of modern advanced static analysis tools is that they are extensible: They provide an API with abstractions that make it convenient to implement custom static-analysis algorithms. Using CodeSonar’s API, they were able to program a solution that piggybacked on the algorithms used at the core of the existing analyses to find locations in the code where the design constraint was being violated. The resulting tool, implemented as a plug-in, is able to find violations of the key constraint automatically, all at a fraction of the cost and in much less time than was previously possible.
Multicore trade-off
There are compelling reasons to move to multicore processor designs, but the risk is that doing so introduces the possibility of concurrency defects in the software. These are easy to introduce – even apparently innocent code can harbor nasty multithreading bugs – and notoriously difficult to diagnose and eliminate when they occur. Traditional testing techniques alone are inadequate to ensure high-quality software, mainly because of the high degree of nondeterminism. The use of advanced static analysis tools that use symbolic execution is one approach that can help because such tools can reason about all possible ways in which the code can execute. These tools can find defects such as data races and deadlocks in code that uses standard multithreading libraries, and can even be adapted to designs that use nonstandard concurrency constructs.
References
[1] Boehm, H.-J., How to miscompile programs with “benign” data races. In HotPar’11 Proceedings of the 3rd USENIX conference on Hot topics in parallelism.
[2] Leveson, N.G., An investigation of the Therac-25 accidents. IEEE Computer, 1993. 26: pp. 18-41.
[3] Poulsen, K., Tracking the blackout bug, www.securityfocus.com/news/8412.
[4] C Standards Committee (WG14). Committee Draft: www.open-std.org/jtc1/sc22/wg14/www/docs/n1539.pdf
Paul Anderson is VP of Engineering at GrammaTech. He received his B.Sc. from Kings College, University of London and his Ph.D. in Computer Science from City University London. Paul manages GrammaTech’s engineering team and is the architect of the company’s static analysis tools. Paul has worked in the software industry for 20 years, with most of his experience focused on developing static analysis, automated testing, and program transformation tools. Contact him at [email protected].
GrammaTech, Inc. 607-273-7340 www.grammatech.com