Skip to content

Latest commit

 

History

History
323 lines (256 loc) · 11.6 KB

README.md

File metadata and controls

323 lines (256 loc) · 11.6 KB

Play Fair: Frame Attributions in Video Models

Concept figure

Website | Demo | arXiv

This repository contains the code accompanying the ACCV 2020 paper "Play Fair: Frame Attributions in Video Models". We introduce a way of computing how much each frame contributes to the output of a model. Our approach, the Element Shapley Value (ESV), is based on the classic solution to the reward distribution problem in cooperative games called the Shapley Value. ESV is not just restricted to evaluating the contribution of a frame to a video, but can be applied to any model that performs light-weight modelling on top of time series data to assess the contribution of each element in the series.

Want to play around with Element Shapley Values for the models in the paper?

Check out our demo which allows you to investigate the ESVs computed for a TRN model on the Something -Something v2 dataset.

If you want to explore further follow the set up guide below, extract features from the backbone models, and compute ESVs yourself.

Set up

Environment

$ conda env create -n play-fair -f environment.yml

You will also need to install a version of ffmpeg with vp9 support, we suggest using the static builds provided by John Van Sickle:

$ wget "https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz"
$ tar -xvf "ffmpeg-git-amd64-static.tar.xz"
$ mkdir -p bin
$ mv ffmpeg-git-*-amd64-static/{ffmpeg,ffprobe} bin

You will always need to set your PYTHONPATH to include the src folder. We provide a .envrc for use with direnv which will automatically do that for you when you cd in the project directory. Alternatively just run:

$ export PYTHONPATH=$PWD/src 

Data

We store our files in the gulpio format.

  1. Download something-something-v2
  2. Gulp the validation set
    $ python src/scripts/gulp_ssv2.py \
         <path-to-labels>/something-something-v2-validation.json \
         <-path-to-20bn-something-something-v2> \
         datasets/ssv2/gulp/validation
    This should take around ~15 mins if you are writing to an SSD backed filesystem, it will take longer if you're writing to an HDD. If you need to write the gulp directory to somewhere other than the path specified in the command above, make sure to symlink it afterwards to datasets/ssv2/gulp/validation so the configuration files don't need to be updated.

Models

We provide two models, a TRN and TSN, for analysis. Download these by running

$ cd checkpoints
$ bash ./download.sh

Check that they have all downloaded:

$ tree -h
.
├── [4.0K]  backbones
│   ├── [ 40M]  trn.pth
│   └── [ 92M]  tsn.pth
├── [2.5K]  download.sh
└── [4.0K]  features
    ├── [ 37M]  mtrn_16_frames.pth
    ├── [ 10M]  mtrn_8_frames.pth
    ├── [8.0M]  trn_10_frames.pth
    ├── [8.8M]  trn_11_frames.pth
    ├── [9.5M]  trn_12_frames.pth
    ├── [ 10M]  trn_13_frames.pth
    ├── [ 11M]  trn_14_frames.pth
    ├── [ 12M]  trn_15_frames.pth
    ├── [ 13M]  trn_16_frames.pth
    ├── [1.3M]  trn_1_frames.pth
    ├── [2.0M]  trn_2_frames.pth
    ├── [2.8M]  trn_3_frames.pth
    ├── [3.5M]  trn_4_frames.pth
    ├── [4.3M]  trn_5_frames.pth
    ├── [5.0M]  trn_6_frames.pth
    ├── [5.8M]  trn_7_frames.pth
    ├── [6.5M]  trn_8_frames.pth
    ├── [7.3M]  trn_9_frames.pth
    └── [175K]  tsn.pth

2 directories, 22 files

Extracting features

As computing ESVs is expensive, requiring many thousands of model evaluations, we work with temporal models that operate over features. We can run these in a reasonable amount of time, in the order of milliseconds--seconds depending on number of frames and whether approximate methods are used or not.

We provide a script to extract per-frame features, saving them to an HDF file. Extract these features for TSN and TRN

$ python src/scripts/extract_features.py \
    --split validation \
    configs/trn_bninception.jsonnet \
    dataset/ssv2/features/trn.hdf

$ python src/scripts/extract_features.py \
    --split validation \
    configs/tsn_resnet50.jsonnet \
    dataset/ssv2/features/tsn.hdf

Computing ESVs

We provide two methods to compute ESVs, one where the model supports a variable-length input (e.g. TSN) and one which takes a collection of models each of which operate over a fixed-length input (e.g. TRN).

Computing class priors

Regardless of whether your model supports variable-length inputs or not, we need to compute the class priors to use in the computation of the ESV. We provide a script that does this by computing the empirical class frequency over the training set.

$ python src/scripts/compute_ssv2_class_priors.py \
    something-something-v2-train.json \
    datasets/ssv2/class-priors.csv

Models supporting a variable-length input

Computing ESVs for models' supporting variable-length input is a straight forward application of the original Shapley Value formula using the characteristic function:

v(X) = f(X) - f(∅)

We have to define f(∅), the simplest choice is to define it as the the prior probability of observing a class based on the frequency of examples in the training set. Alternatively you can run the model over the training set to obtain the average output (practically there is little difference between these choices).

Since the Shapley value is computed by measuring the difference between characteristic function evaluations, we make an optimisation by eliminating the subtraction of f(∅) in the implementation. Instead we tweak the definition of the characteristic function to be v(X) = f(X) if |X| >= 1, else we define v(∅) as the set of class priors. This results in computing the same Shapley values but without having to perform a subtraction for each characteristic function evaluation. We implement this in the CharacteristicFunctionShapleyAttributor.

We provide an example of how to do this for TSN as it is a model supporting a variable-length input: (make sure you're set up your environment, downloaded and prepped the dataset, and downloaded the models first)

$ python src/scripts/compute_esvs.py \
    configs/feature_tsn.jsonnet \
    datasets/ssv2/class-priors.csv \
    tsn-esv-n_frames=8.pkl \
    --sample-n-frames 8

Models supporting a fixed-length input

For models that don't support a variable-length input, we propose a way of ensembling a collection of fixed-length input models into a new meta-model which we can then compute ESVs for. To make this explanation more concrete, we now describe the process in detail for TRN. To start with, we train multiple TRN models for 1, 2, ..., n frames separately. By training these models separately we ensure that they are capable of acting alone (this also has the nice benefit of improving performance over joint training in our experience!). At inference time, we compute all possible subsampled variants of the input video we wish to classify and pass each of these through the corresponding single scale model. We aggregate scores so that each scale is given equal weighting in the final result.

Our paper proposes a joint computation method of this multiscale model and its ESVs. This is implemented in the OnlineShapleyAttributor class.

We provide an example of how to do this for TRN, as the basic variant only supports a fixed-length input. (make sure you're set up your environment, downloaded and prepped the dataset, and downloaded the models first)

$ python src/scripts/compute_esvs.py \
    configs/feature_multiscale_trn.jsonnet \
    datasets/ssv2/class-priors.csv \
    mtrn-esv-n_frames=8.pkl \
    --sample-n-frames 8

Visualisation

We provide a dashboard to investigate model behaviour when we vary how many frames are fed to the model. This dashboard is powered by multiple sets of results produced by the compute_esv.py script.

First we compute ESVs for 1--8 frame inputs:

$ for n in $(seq 1 8); do
    python src/scripts/compute_esvs.py \
        configs/feature_multiscale_trn.jsonnet \
        datasets/ssv2/class-priors.csv \
        mtrn-esv-n_frames=$n.pkl \
        --sample-n-frames $n
  done

Then we collate them:

$ python src/scripts/collate_esvs.py \
    --dataset "Something Something v2" \
    --model "MTRN" \
    mtrn-esv-n_frames={1..8}.pkl \
    mtrn-esv-min_n_frames=1-max_n_frames=8.pkl

before we can run the dashboard, we need to dump out the videos from he gulp directory as webm files (since when we gulp the files, the FPS is altered!). Watch out that you don't end up using the conda bundled ffmpeg which doesn't support VP9 encoding if you replace ./bin/ffmpeg with ffmpeg, check which you are using by running which ffmpeg.

$ python src/scripts/dump_frames_from_gulp_dir.py \
    datasets/ssv2/gulp/validation
    datasets/ssv2/frames

$ for frame_dir in datasets/ssv2/frames/*; do \
    if [[ -f "$frame_dir/frame_000001.jpg" && ! -f "${frame_dir}.webm" ]] ; then \
      ./bin/ffmpeg \
          -r 8 \
          -i "$frame_dir/frame_%06d.jpg" \
          -c:v vp9 \
          -row-mt 1 \
          -speed 4 \
          -threads 8 \
          -b:v 200k \
          "${frame_dir}.webm"; \
    fi \
 done
$ mkdir datasets/ssv2/videos
$ mv datasets/ssv2/frames/*.webm datasets/ssv2/videos

and now we can run the ESV dashboard:

$ python src/apps/esv_dashboard/visualise_esvs.py \
    mtrn-esv-min_n_frames=1-max_n_frames=8.pkl \
    datasets/ssv2/videos \
    src/datasets/metadata/something_something_v2/classes.csv

Approximating ESVs

When sequences become long, it no longer becomes possible to compute ESVs exactly and instead an approximation has to be employed. compute_esvs.py supports computing approximate ESVs through the --approximate* flags. Also check out the approximation demo notebook to see how changing the approximation parameters effects the variance of the resulting ESVs.