The goal of this repository is to reproduce the numerical experiments from [1].
If you use this software in your work, please cite it using the following metadata.
@misc{gnecco2024boostedcontrolfunctionsdistribution,
title={Boosted Control Functions: Distribution generalization and invariance in confounded models},
author={Nicola Gnecco and Jonas Peters and Sebastian Engelke and Niklas Pfister},
year={2024},
eprint={2310.05805},
archivePrefix={arXiv},
primaryClass={stat.ML},
url={https://arxiv.org/abs/2310.05805},
}
.
├── R-code
│ ├── R # R functions
│ └── main # R scripts
|
├── data
│ ├── processed # Processed data
│ └── raw
│ └── prism_temp # Temperature data
|
├── python
│ ├── main # Python scripts
│ └── src
│ ├── bcf # Classes and functions for BCF
│ ├── scenarios # Functions to generate data
│ ├── simulations # Functions to run simulations
│ └── utils # Helper functions
|
└── results
├── figures # Saved figures
└── output_data # Saved results
- Python: version (3.9.1)
- R: version (4.0.2)
- Ensure Python and pip are installed: Before installing the requirements, make sure you have Python installed on your system. You can check by running in the terminal:
python --version
Also, ensure you have pip
(Python package manager) installed by running in the terminal:
pip --version
If you don't have Python or pip installed, please download Python and install it. Pip is included in Python versions 3.4+.
- Navigate to python directory:
Use the terminal to navigate to the directory containing your
requirements.txt
file.
cd python
- Install the requirements: Once inside the directory, run the following command to install the required packages:
pip install -r requirements.txt
pip install -e .
The first line will install all the Python packages listed in requirements.txt
.
The second line will install the python code located in ./python/src/
as a package.
- Ensure R is installed: Make sure you have R installed on your system by running in the terminal:
R --version
If you don't have R installed please download R and install it.
- Install the requirements:
Move to the R-code directory and launch the R script
dependencies.R
by running in the terminal:
cd R-code
Rscript --vanilla main/dependencies.R
1a) How to run on a PBS/TORQUE cluster:
- Open
./python/main/run_job.sh
and modify lines 29–31 depending on your cluster. - From the root of the project, in the terminal type the following:
cd python
qsub main/run_job.sh "python main/experiment_1.py"
1b) How to run locally
-
Open
./python/main/experiment_1.py
and on line 13 set the number of cores according to your machine, e.g.,N_WORKERS = 4
. -
From the root of the project, in the terminal type the following:
cd python
python main/experiment_1.py
💡 To run only a few iterations of the experiment, change the number of repetitions on line 29 of ./python/main/experiment_1.py
, e.g., "n_reps": range(2)
.
2) Plot results
- From the root of the project, in the terminal type the following:
cd R-code
Rscript --vanilla main/plot-interv_sim.R
1a) How to run on a PBS/TORQUE cluster:
- Open
./python/main/run_job.sh
and modify lines 29–31 depending on your cluster. - From the root of the project, in the terminal type the following:
cd python
qsub main/run_job.sh "python main/experiment_2.py"
1b) How to run locally
-
Open
./python/main/experiment_2.py
and on line 13 set the number of cores according to your machine, e.g.,N_WORKERS = 4
. -
From the root of the project, in the terminal type the following:
cd python
python main/experiment_2.py
💡 To run only a few iterations of the experiment, change the number of repetitions on line 31 of ./python/main/experiment_2.py
, e.g., "n_reps": range(2)
.
2) Plot results
- From the root of the project, in the terminal type the following:
cd R-code
Rscript --vanilla main/plot-nullspace_sim.R
To run the experiment and produce the figures for the California housing dataset, follow these steps.
cd python
python main/housing_data-import.py
python main/asc2csv.py ../data/raw/prism_temp/ "../data/processed/temp-data-california.csv" "Mean_Temp"
cd ../R-code
Rscript --vanilla main/prepare-temp_housing_data.R
cd ../python
python main/housing_data_analysis_regions.py
cd ../R-code
Rscript --vanilla main/plot-temp_housing_data.R
This project is licensed under the MIT License - see the LICENSE file for details.
[1]: Nicola Gnecco, Jonas Peters, Sebastian Engelke, and Niklas Pfister. 2024. "Boosted Control Functions: Distribution generalization and invariance in confounded models." arXiv Preprint [https://arxiv.org/abs/2310.05805].