Paper on Artifact Evaluations accepted at FSE 2020

To foster replicable research, many conferences encourage the submission of ‘research artifacts’ along with papers. Artifacts essentially are anything that has contributed to generating the results presented in a research article. The submitted artifacts are evaluated by dedicated artifact evaluation committees (AECs). I’ve had the pleasure to serve on such AECs myself (ISSTA 2016 and ISSTA 2018) and we also received a positive evaluation for our artifact at ISSTA 2019.

One thing that I’ve been struggling with a bit is what makes a good artifact and I’ve had long and interesting discussions with Ben Hermann, who co-chaired the ISSTA 2018 AEC, on this topic. We realized that our views on artifact quality slightly differed in various nuances, which led us to the question if the perceptions of artifact quality generally differ across AEC members and what the causes and possible impact of such different perceptions are.

With the help of Janet Siegmund, who is an expert in qualitative surveys and a fantastic discussion partner when it comes to research methodology in general, we designed a questionnaire and invited all past AEC members from AECs at software engineering and programming language conferences to tell us about their perceptions of artifact purposes and quality.

The paper that discusses the results from this survey has just been accepted at FSE 2020, which is the best venue for the paper that I can imagine. FSE has pioneered artifact evaluations in the software engineering community and has been conducting these evaluations for almost a decade.

We thank all the anonymous participants of our study, the anonymous reviewers of our paper, and the AEC members that currently evaluate our research artifact. We hope our paper contributes to the continuous improvement of artifact evaluations and replicable research in general. A preprint of our paper and its research artifact are available here.

TraceSanitizer Paper at DSN 2020

Our paper on sanitizing execution traces from effects of benign execution non-determinism has been accepted at DSN’20.

The paper addresses a problem we frequently came across in Error Propagation Analyses (EPA) using fault injections. In EPA execution traces are commonly used as an auxiliary oracle: Execution traces under fault injection are compared to execution traces from fault-free runs and if they deviate that’s an indicator for error propagation (for a more detailed discussion of this usage, please see our ASE’17 paper on TrEKer).

This trace comparison does not work reliably under benign excution non-determinism, i.e., when operating systems (OSs), run-times, and libraries have the freedom to alter program execution in order to achieve better performance, as long as it does not affect the outcome of the execution. A prominent example for this is thread scheduling. Assuming the program is race free, it does not matter which thread is scheduled for execution when. The outcome is the same and the OS can prevent, for instance, threads that are waiting for I/O from blocking the CPU.

The problem this causes for EPA is that the execution traces can deviate, even if there is no effect from a fault. Even in consecutive fault-free runs, there will be deviations, because the execution order of instructions from different threads can differ. In our paper we solve this problem for an important class of programs that we term pseudo-deterministic and for which conflicting accesses to shared data (operations from different threads, at least one of which is a write operation) must always occur in the same order. An earlier approach to solve the same problem, which we presented at ICST’17, is applicable to a wider range of programs, but (contrary to TraceSanitizer) may lead to false positives in EPA.

To decide whether a program is pseudo-deterministic and TraceSanitizer can be applied, we introduce an automated check based on SMT solver supported maximal causal reasoning on fault-free traces. If the check passes, we sanitize execution traces (from fault-free runs and fault injections) by eliminating effects from both non-deterministic thread scheduling and dynamic memory allocations.

The paper can be found here. The TraceSanitizer prototype implementation, which has been developed for LLFI, is available on github.