Efficient Detection of Test Interference in C Projects

Florian Eder

florian@eder.fyi

LMU Munich

Stefan Winter

sw@stefan-winter.net

LMU Munich (& Ulm University)

Materials Available

https://www.stefan-winter.net/presentations/test_interference_c.html

Speaker Background

Dr.-Ing: TU Darmstadt
Postdoc: TU Darmstadt, UIUC, LMU Munich
Lecturer/Interim prof.: HAW Landshut, LMU Munich, Ulm University

Research focus: Software Dependability, Software Tests, Research Software

Regression Testing

Test interference:

Pass in some test suite execution orders, fail in others (Zhang et al. 2014; Luo et al. 2014; Lam et al. 2019)
Randomly pass/fail when tests execute concurrently
(Winter et al. 2015; Schwahn et al. 2019)

Test Interference

Example: `libkdumpfile`

#! /bin/sh

name=xlatmap
resultfile="out/${name}.result"
expectfile="$srcdir/$name.expect"

echo -n "Checking... "
./xlatmap >"$resultfile"

Excerpt from:
https://github.com/ptesarik/libkdumpfile/blob/c54a90c2756e0ca7f9b45662ad3c987403ee7360/tests/xlatmap-check

Test Interference

Example `libkdumpfile`

#! /bin/sh

name=xlatmap
resultfile="out/${name}.result"
expectfile="$srcdir/$name.expect"

mkdir -p out

echo -n "Checking... "
./xlatmap >"$resultfile"

Excerpt from accepted fix:
https://github.com/ptesarik/libkdumpfile/blob/e6c5fde6ac7201185292539bef7203c9618ac773/tests/xlatmap-check

Study Goal and Subjects

Goal: How common is test interference in C projects?

Order dependency detection
Concurrency dependency detection

Subjects:

Initial sample from GitHub (566 projects)
Inspect, build, run tests → 134 projects

Order Dependencies

Complete detection: Run all test suite permutations (\(n!\))
- libkdumpfile has 184 tests
- \(2.2 \times 10^{338}\) test suite permutations
- 22s per test suite run → \(4.9 \times 10^{339}\)s
  (estimated age of universe: \(4.3 \times 10^{17}\)s)

Order Dependencies

Empirical results show pairwise permutations mostly suffice (\(n\cdot(n-1)\)) (Zhang et al. 2014; Shi et al. 2019)
- Factorial down to quadratic complexity
- 33672 test pair runs and > 2h for libkdumpfile
- Still too long to run in CI

Order Dependencies

Insight: No shared resource access → no interference

Idea: Run every test once and record reads/writes on possibly shared resources

Order Dependencies

Insight: No shared resource access → no interference

Idea: Run every test once and record reads/writes on possibly shared resources

Order Dependencies

File Descriptors Filtered (FDF)

Order Dependencies

Overlay FS + FDF (OFSFDF)

Not every write access leads to a change
Idea: Snapshot filesystem before/after test run

Order Dependencies

Overlay FS + FDF (OFSFDF)

Concurrency Dependencies

Approach

Provoking diverse schedules → “Noise Injection”
Limit CPU time for test processes via Linux control groups
Rerun 50 times with coverage instrumentation
If coverage fluctuates → rerun 50 more times

Results

Identified Interference

2 Order Dependencies
- 1 fixed, 1 ACKed
11 Concurrency Dependencies
- 1 fixed, 1 ACKed
13 Execution Time Dependencies
(collateral detection)
- 2 ACKed

Results

Language Comparison

Language	Interf. Root Cause
Language	Order	Concurrency
Java (Luo et al. 2014)	`~10%`	`~15%`
Python (Gruber et al. 2021)	`59%`	`3%`
JavaScript (Hashemi, Tahir, and Rasheed 2022)	`<1%`	`>20%`
C	`~7%`	`~42%`

Results

Test Order Reductions

libkdumpfile:
33672 test pair runs and > 2h
→
4 test pair runs and < 1s

Summary

Findings:

Order dependencies less frequent in C than in other languages
Concurrency dependencies more frequent in C than in other languages
Implicit execution time assumptions

Contributions:

Language-agnostic diagnosis reduction approach with open implementation
134 containerized C projects with developer tests

Paper:

Artifact:

References

Gruber, Martin, Stephan Lukasczyk, Florian Kroiß, and Gordon Fraser. 2021. “An Empirical Study of Flaky Tests in Python.” In 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST), 148–58. https://doi.org/10.1109/ICST49551.2021.00026.

Hashemi, Negar, Amjed Tahir, and Shawn Rasheed. 2022. “An Empirical Study of Flaky Tests in JavaScript.” In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), 24–34. https://doi.org/10.1109/ICSME55016.2022.00011.

Lam, Wing, Reed Oei, August Shi, Darko Marinov, and Tao Xie. 2019. “iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests.” In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST), 312–22. https://doi.org/10.1109/ICST.2019.00038.

Luo, Qingzhou, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. “An Empirical Analysis of Flaky Tests.” In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 643–53. FSE 2014. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2635868.2635920.

Schwahn, Oliver, Nicolas Coppik, Stefan Winter, and Neeraj Suri. 2019. “Assessing the State and Improving the Art of Parallel Testing for C.” In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 123–33. ISSTA 2019. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3293882.3330573.

Shi, August, Wing Lam, Reed Oei, Tao Xie, and Darko Marinov. 2019. “IFixFlakies: A Framework for Automatically Fixing Order-Dependent Flaky Tests.” In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 545–55. ESEC/FSE 2019. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3338906.3338925.

Winter, S., O. Schwahn, R. Natella, N. Suri, and D. Cotroneo. 2015. “No PAIN, No Gain? The Utility of PArallel Fault INjections.” In Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on, 1:494–505. https://doi.org/10.1109/ICSE.2015.67.

Zhang, Sai, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin. 2014. “Empirically Revisiting the Test Independence Assumption.” In Proceedings of the 2014 International Symposium on Software Testing and Analysis, 385–96. ISSTA 2014. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2610384.2610404.

Efficient Detection of Test Interference in C Projects

Materials Available

Speaker Background

Regression Testing

Regression Testing

Test Interference

Example: libkdumpfile

Test Interference

Example libkdumpfile

Study Goal and Subjects

Order Dependencies

Order Dependencies

Order Dependencies

Order Dependencies

Order Dependencies

File Descriptors Filtered (FDF)

Order Dependencies

Overlay FS + FDF (OFSFDF)

Order Dependencies

Overlay FS + FDF (OFSFDF)

Concurrency Dependencies

Approach

Results

Identified Interference

Results

Language Comparison

Results

Test Order Reductions

Summary

References

Example: `libkdumpfile`

Example `libkdumpfile`