Amin Parchami-Araghi* · Moritz Böhle* · Sukrut Rao* · Bernt Schiele
This codebase was adapted from B-cos-v2 Codebase. It is based on Pytorch Lightning and comes with several useful features such as Distributed Data Parallel, Slurm integration (via submitit), and WandB logging. I therefore changed the codebase to work for our distillation setting. That means, the overall codebase is not fully optimized for KD and if you are looking for a distillation codebase, I strongly suggest popular codebases such as Torchdistill. Please also refer only to B-cos-v2 Codebase for the most up-to-date implementation on B-cos Networks.
After cloning the repo, please download the teacher checkpoints using the following:
bash download_teachers.sh
Afterwards, refer to run.sh
for sample training commands. You would need to adjust the dataset paths at beginning of the script.
Most of the essential implementations of our method can be found under bcos/distillation_methods/TeachersExplain.py
.
-
The Loss: The
forward()
pass computes the losses for both logits and explanations. -
Computing the Explanations: The explanations are computed together with logits under
explanation_aware_forward()
. -
Train and Eval Step: There are three callbacks for distillation:
distillation_
(trainstep()
,evalstep()
, andinit()
). These are also defined underbcos/distillation_methods/TeachersExplain.py
to be called by the Trainer class (as defined inbcos/training/trainer.py
). During training or validation, the Trainer will call eitherdistillation_trainstep()
ordistillation_evaltep()
to process a batch and compute the loss. Their main difference is the gradient computation.distillation_init()
is also called only once during initialization of the Trainer.
Similar to B-cos-v2 Codebase, the configs are implemented as key-value pairs using standard distionaries. Our configs for ImageNet experiments can be found in bcos/experiments/ImageNet/final_live/experiment_parameters.py
.
In order to avoid repitition, the dictionaries may override each other. For example, a fewshot KD experiment, has the same exact config as a normal experiment, except for the changes for fewshot settings.
Please note that the configs may generate hyperparameter combinations that were not actually tested in the paper. To see the hyperparameters that we have used for the paper, kindly refer to the Section C in our supplement.
If you use our work or our codebase, please cite our work as follows:
@inproceedings{parchamiaraghi2024good,
title = {Good Teachers Explain: Explanation-Enhanced Knowledge Distillation},
author = {Amin Parchami-Araghi and Moritz Böhle and Sukrut Rao and Bernt Schiele},
booktitle = {European Conference on Computer Vision},
year = {2024},
}
- Adding Waterbirds Configs
- Adding Pascal VOC Configs
- Adding Method Documentation
- Adding reference for borrowed code snippets