Skip to content

Latest commit

 

History

History

esecfse22

ACM Artifacts Evaluated Reusable

Maven Documentation Install GitHubPages License DOI

Classifying Edits to Variability in Source Code

This is the replication package for our paper Classifying Edits to Variability in Source Code accepted at the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022).

This replication package consists of four parts:

  1. DiffDetective: For our validation, we built DiffDetective, a java library and command-line tool to classify edits to variability in git histories of preprocessor-based software product lines.
  2. Appendix: The appendix of our paper is given in PDF format in the file appendix.pdf.
  3. Haskell Formalization: We provide an extended formalization in the Haskell programming language as described in our appendix. Its implementation can be found in the Haskell project in the proofs directory.
  4. Dataset Overview: We provide an overview of the 44 inspected datasets with updated links to their repositories in the file docs/datasets/all.md.

1. DiffDetective

DiffDetective is a java library and command-line tool to parse and classify edits to variability in git histories of preprocessor-based software product lines by creating variation diffs and operating on them.

We offer a Docker setup to easily replicate the validation performed in our paper. In the following, we provide a quickstart guide for running the replication. You can find detailed information on how to install Docker and build the container in the INSTALL file, including detailed descriptions of each step and troubleshooting advice.

Prerequisite

All following commands assume that working directory of your terminal is the esecfse directory. Please switch directories, if this is not the case:

cd DiffDetective/replication/esecfse22

1.1 Build the Docker container

Start the docker deamon. Clone this repository. Open a terminal and navigate to the root directory of this repository. To build the Docker container you can run the build script corresponding to your operating system.

Windows:

.\build.bat

Linux/Mac (bash):

./build.sh

1.2 Start the replication

To execute the replication you can run the execute script corresponding to your operating system with replication as first argument.

Windows:

.\execute.bat replication

Linux/Mac (bash):

./execute.sh replication

WARNING! The replication will at least require an hour and might require up to a day depending on your system. Therefore, we offer a short verification (5-10 minutes) which runs DiffDetective on only four of the datasets. You can run it by providing "verification" as argument instead of "replication" (i.e., .\execute.bat verification, ./execute.sh verification). If you want to stop the execution, you can call the provided script for stopping the container in a separate terminal. When restarted, the execution will continue processing by restarting at the last unfinished repository.

Windows:

.\stop-execution.bat

Linux/Mac (bash):

./stop-execution.sh

You might see warnings or errors reported from SLF4J like Failed to load class "org.slf4j.impl.StaticLoggerBinder" which you can safely ignore. Further troubleshooting advice can be found at the bottom of the Install file.

1.3 View the results in the results directory

All raw results are stored in the results directory. The aggregated results can be found in the following files. (Note that the links below only have a target after running the replication or verification.)

  • speed statistics: contains information about the total runtime, median runtime, mean runtime, and more.
  • classification results: contains information about how often each class was found, and more.

Moreover, the results comprise the (LaTeX) tables that are part of our paper and appendix.

Documentation

DiffDetective is documented with javadoc. The documentation can be accessed on this website. Notable classes of our library are:

  • DiffTree and DiffNode implement variation diffs from our paper. A variation diff is represented by an instance of the DiffTree class. It stores the root node of the diff and offers various methods to parse, traverse, and analyze variation diffs. DiffNodes represent individual nodes within a variation diff.
  • EditClassValidation contains the main method for our validation.
  • ProposedEditClasses holds the catalog of the nine edit classes we proposed in our paper. It implements the interface EditClassCatalogue, which allows to define custom edit classifications.
  • BooleanAbstraction contains data and methods for boolean abstraction of higher-order logic formulas. We use this for macro parsing.
  • GitDiffer may parse the history of a git repository to variation diffs.
  • The datasets package contains various classes for describing and loading datasets.

2. Appendix

Our appendix consists of:

  1. An extended formalization of our concepts in the Haskell programming language. The corresponding source code is also part of this replication package (see below).
  2. The proofs for (a) the completeness of variation diffs to represent edits to variation trees, and (b) the completeness and unambiguity of our edit classes.
  3. An inspection of edit patterns from related work to show that existing patterns are either composite patterns built from our edit classes or similar to one of our edit classes. The used diffs of these patterns can also be found in docs/compositepatterns.
  4. The complete results of our validation for all 44 datasets.

3. Haskell Formalization

The extended formalization is a Haskell library in the proofs subdirectory. Since the proofs library is its own software project, we provide a separate documentation of requirements and installation instructions within the projects subdirectory. Requirements and instructions for setting up the build environment (Stack) are given in proofs/REQUIREMENTS.md. How to build our library and how to run the example is described in the proofs/INSTALL.md.

4. Dataset Overview

4.1 Open-Source Repositories

We provide an overview of the used 44 open-source preprocessor-based software product lines in the docs/datasets/all.md file. As described in our paper in Section 5.1, this list contains all systems that were studied by Liebig et al., extended by four new subject systems (Busybox, Marlin, LibSSH, Godot). We provide updated links for each system's repository.

4.2 Forked Repositories for Replication

To guarantee the exact replication of our validation, we created forks of all 44 open-source repositories at the state we performed the validation for our paper. The forked repositories are listed in the replication datasets and are located at the Github user profile DiffDetective. These repositories are used when running the replication as described under 1.2 and in the INSTALL.

5. Running DiffDetective on Custom Datasets

You can also run DiffDetective on other datasets by providing the path to the dataset file as first argument to the execution script:

Windows:

.\execute.bat path\to\custom\dataset.md

Linux/Mac (bash):

./execute.sh path/to/custom/dataset.md

The input file must have the same format as the other dataset files (i.e., repositories are listed in a Markdown table). You can find dataset files in the docs/datasets folder.