Efficient Detection of Test Interference in C Projects

Florian Eder

LMU Munich

Stefan Winter

LMU Munich (& Ulm University)

Materials Available

https://www.stefan-winter.net/presentations/test_interference_c.html

Speaker Background

  • Dr.-Ing: TU Darmstadt
  • Postdoc: TU Darmstadt, UIUC, LMU Munich
  • Lecturer/Interim prof.: HAW Landshut, LMU Munich, Ulm University

Research focus: Software Dependability, Software Tests, Research Software

Regression Testing

Regression Testing

Test interference:

Test Interference

Example: libkdumpfile

#! /bin/sh

name=xlatmap
resultfile="out/${name}.result"
expectfile="$srcdir/$name.expect"

echo -n "Checking... "
./xlatmap >"$resultfile"

Excerpt from:
https://github.com/ptesarik/libkdumpfile/blob/c54a90c2756e0ca7f9b45662ad3c987403ee7360/tests/xlatmap-check

Test Interference

Example libkdumpfile

#! /bin/sh

name=xlatmap
resultfile="out/${name}.result"
expectfile="$srcdir/$name.expect"

mkdir -p out

echo -n "Checking... "
./xlatmap >"$resultfile"

Excerpt from accepted fix:
https://github.com/ptesarik/libkdumpfile/blob/e6c5fde6ac7201185292539bef7203c9618ac773/tests/xlatmap-check

Study Goal and Subjects

Goal: How common is test interference in C projects?

  • Order dependency detection
  • Concurrency dependency detection

Subjects:

  • Initial sample from GitHub (566 projects)
  • Inspect, build, run tests → 134 projects

Order Dependencies

  • Complete detection: Run all test suite permutations (\(n!\))
    • libkdumpfile has 184 tests
    • \(2.2 \times 10^{338}\) test suite permutations
    • 22s per test suite run → \(4.9 \times 10^{339}\)s
      (estimated age of universe: \(4.3 \times 10^{17}\)s)

Order Dependencies

  • Empirical results show pairwise permutations mostly suffice (\(n\cdot(n-1)\)) (Zhang et al. 2014; Shi et al. 2019)
    • Factorial down to quadratic complexity
    • 33672 test pair runs and > 2h for libkdumpfile
    • Still too long to run in CI

Order Dependencies

Insight: No shared resource access → no interference

Idea: Run every test once and record reads/writes on possibly shared resources

Order Dependencies

Insight: No shared resource access → no interference

Idea: Run every test once and record reads/writes on possibly shared resources

Order Dependencies

File Descriptors Filtered (FDF)

Order Dependencies

Overlay FS + FDF (OFSFDF)

  • Not every write access leads to a change
  • Idea: Snapshot filesystem before/after test run

Order Dependencies

Overlay FS + FDF (OFSFDF)

Concurrency Dependencies

Approach

  • Provoking diverse schedules → “Noise Injection”
  • Limit CPU time for test processes via Linux control groups
  • Rerun 50 times with coverage instrumentation
  • If coverage fluctuates → rerun 50 more times

Results

Identified Interference

  • 2 Order Dependencies
    • 1 fixed, 1 ACKed
  • 11 Concurrency Dependencies
    • 1 fixed, 1 ACKed
  • 13 Execution Time Dependencies
    (collateral detection)
    • 2 ACKed

Results

Language Comparison


Language
Interf. Root Cause
Order Concurrency
Java (Luo et al. 2014)
~10%
   ~15%
Python (Gruber et al. 2021)
 59%
     3%
JavaScript (Hashemi, Tahir, and Rasheed 2022)
 <1%
   >20%
C
 ~7%
   ~42%

Results

Test Order Reductions


libkdumpfile:
33672 test pair runs and > 2h

4 test pair runs and < 1s

Summary

Findings:

  • Order dependencies less frequent in C than in other languages
  • Concurrency dependencies more frequent in C than in other languages
  • Implicit execution time assumptions

Contributions:

  • Language-agnostic diagnosis reduction approach with open implementation
  • 134 containerized C projects with developer tests

Paper:

Artifact:

References

Gruber, Martin, Stephan Lukasczyk, Florian Kroiß, and Gordon Fraser. 2021. An Empirical Study of Flaky Tests in Python.” In 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST), 148–58. https://doi.org/10.1109/ICST49551.2021.00026.
Hashemi, Negar, Amjed Tahir, and Shawn Rasheed. 2022. An Empirical Study of Flaky Tests in JavaScript.” In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), 24–34. https://doi.org/10.1109/ICSME55016.2022.00011.
Lam, Wing, Reed Oei, August Shi, Darko Marinov, and Tao Xie. 2019. iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests.” In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST), 312–22. https://doi.org/10.1109/ICST.2019.00038.
Luo, Qingzhou, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An Empirical Analysis of Flaky Tests.” In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 643–53. FSE 2014. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2635868.2635920.
Schwahn, Oliver, Nicolas Coppik, Stefan Winter, and Neeraj Suri. 2019. Assessing the State and Improving the Art of Parallel Testing for C.” In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 123–33. ISSTA 2019. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3293882.3330573.
Shi, August, Wing Lam, Reed Oei, Tao Xie, and Darko Marinov. 2019. IFixFlakies: A Framework for Automatically Fixing Order-Dependent Flaky Tests.” In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 545–55. ESEC/FSE 2019. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3338906.3338925.
Winter, S., O. Schwahn, R. Natella, N. Suri, and D. Cotroneo. 2015. No PAIN, No Gain? The Utility of PArallel Fault INjections.” In Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on, 1:494–505. https://doi.org/10.1109/ICSE.2015.67.
Zhang, Sai, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin. 2014. Empirically Revisiting the Test Independence Assumption.” In Proceedings of the 2014 International Symposium on Software Testing and Analysis, 385–96. ISSTA 2014. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2610384.2610404.