Our paper on sanitizing execution traces from effects of benign execution non-determinism has been accepted at DSN’20.
The paper addresses a problem we frequently came across in Error Propagation Analyses (EPA) using fault injections. In EPA execution traces are commonly used as an auxiliary oracle: Execution traces under fault injection are compared to execution traces from fault-free runs and if they deviate that’s an indicator for error propagation (for a more detailed discussion of this usage, please see our ASE’17 paper on TrEKer).
This trace comparison does not work reliably under benign excution non-determinism, i.e., when operating systems (OSs), run-times, and libraries have the freedom to alter program execution in order to achieve better performance, as long as it does not affect the outcome of the execution. A prominent example for this is thread scheduling. Assuming the program is race free, it does not matter which thread is scheduled for execution when. The outcome is the same and the OS can prevent, for instance, threads that are waiting for I/O from blocking the CPU.
The problem this causes for EPA is that the execution traces can deviate, even if there is no effect from a fault. Even in consecutive fault-free runs, there will be deviations, because the execution order of instructions from different threads can differ. In our paper we solve this problem for an important class of programs that we term pseudo-deterministic and for which conflicting accesses to shared data (operations from different threads, at least one of which is a write operation) must always occur in the same order. An earlier approach to solve the same problem, which we presented at ICST’17, is applicable to a wider range of programs, but (contrary to TraceSanitizer) may lead to false positives in EPA.
To decide whether a program is pseudo-deterministic and TraceSanitizer can be applied, we introduce an automated check based on SMT solver supported maximal causal reasoning on fault-free traces. If the check passes, we sanitize execution traces (from fault-free runs and fault injections) by eliminating effects from both non-deterministic thread scheduling and dynamic memory allocations.
The paper can be found here. The TraceSanitizer prototype implementation, which has been developed for LLFI, is available on github.