Skip to content

Latest commit

 

History

History
252 lines (211 loc) · 16.8 KB

README.rst

File metadata and controls

252 lines (211 loc) · 16.8 KB

VQA-CP Leaderboard

A collections of papers about the VQA-CP dataset and a benchmark / leaderboard of their results. VQA-CP is an out-of-distribution dataset for Visual Question Answering, which is designed to penalize models that rely on question biases to give an answer. You can download VQA-CP annotations here : https://computing.ece.vt.edu/~aish/vqacp/

Notes:

  • All reported papers do not use the same baseline architectures, so the scores might not be directly comparable. This leaderboard is only made as a reference of all bias-reduction methods that were tested on VQA-CP.
  • We mention the presence or absence of a validation set, because for out-of-distribution datasets, it is very important to find hyperparameters and do early-stopping on a validation set that has the same distribution as the training set. Otherwise, there is a risk of overfitting the testing set and its biases, which defeats the point of the VQA-CP dataset. This is why we highly recommand for future work that they build a validation set from a part of training set.

You can read an overview of some of those bias-reduction methods here: https://cdancette.fr/2020/11/21/overview-bias-reductions-vqa/

VQA-CP v2

In bold are highlighted best results on architectures without pre-training.

Name Base Arch. Conference All Yes/No Numbers Other Validation
AttReg_ [2] LMH Preprint 59.92 87.28 52.39 47.65  
GGE-DQ UpDown ICCV 2021 57.32 87.04 27.75 49.59  
AdaVQA UpDown IJCAI 2021 54.67 72.47 53.81 45.58 No Valset
DecompLR UpDown AAAI 2020 48.87 70.99 18.72 45.57 No Valset
MUTANT LXMERT EMNLP 2020 69.52 93.15 67.17 57.78 No valset
MUTANT UpDown EMNLP 2020 61.72 88.90 49.68 50.78 No valset
CL UpDown + LMH + CSS EMNLP 2020 59.18 86.99 49.89 47.16 No valset
RMFE UpDown + LMH NeurIPS 2020 54.55 74.03 49.16 45.82 No Valset
RandImg UpDown NeurIPS 2020 55.37 83.89 41.60 44.20 Valset
Loss-Rescaling UpDown + LMH Preprint 2020 53.26 72.82 48.00 44.46  
ESR UpDown ACL 2020 48.9 69.8 11.3 47.8  
GradSup Unshuffling ECCV 2020 46.8 64.5 15.3 45.9 Valset
VGQE S-MRL ECCV 2020 50.11 66.35 27.08 46.77 No valset
CSS UpDown + LMH CVPR 2020 58.95 84.37 49.42 48.21 No valset
Semantic UpDn + RUBi Preprint 2020 47.5        
Unshuffling UpDown Preprint 2020 42.39 47.72 14.43 47.24 Valset
CF-VQA UpDown + LMH Preprint 2020 57.18 80.18 45.62 48.31 No valset
LMH UpDown EMNLP 2019 52.05 69.81 [1] 44.46 [1] 45.54 [1] No valset
RUBi S-MRL [3] NeurIPS 2019 47.11 68.65 20.28 43.18 No valset
SCR [2] UpDown NeurIPS 2019 49.45 72.36 10.93 48.02 No valset
NSM   NeurIPS 2019 45.80        
HINT [2] UpDown ICCV 2019 46.73 67.27 10.61 45.88 No valset
ActSeek UpDown CVPR 2019 46.00 58.24 29.49 44.33 ValSet
GRL UpDown NAACL-HLT 2019 Workshop 42.33 59.74 14.78 40.76 Valset
AdvReg UpDown NeurIPS 2018 41.17 65.49 15.48 35.48 No Valset
GVQA   CVPR 2018 31.30 57.99 13.68 22.14 No valset
[1](1, 2, 3) Retrained by CSS
[2](1, 2, 3) Using additional information
[3]S-MRL stands for Simplified-MUREL. The architecture was proposed in RUBi.

Papers

GGE-DQ
Greedy Gradient Ensemble for Robust Visual Question Answering
Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian
DecompLR
Overcoming language priors in vqa via decomposed linguistic representations
Chenchen Jing, Yuwei Wu, Xiaoxun Zhang, Yunde Jia, Qi Wu
AdaVQA
AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss
Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Feng Ji, Ji Zhang, Alberto Del Bimbo
MUTANT
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering - EMNLP 2020
Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
CL
Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering - EMNLP 2020
Zujie Liang, Weitao Jiang, Haifeng Hu, Jiaying Zhu
RMFE
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies - NeurIPS 2020
Itai Gat, Idan Schwartz, Alexander Schwing, Tamir Hazan
RandImg
On the Value of Out-of-Distribution Testing:An Example of Goodhart’s Law
Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel
Loss-Rescaling
Loss-rescaling VQA: Revisiting Language Prior Problem from a Class-imbalance View - Preprint 2020
Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Qi Tian
ESR (Embarrassingly Simple Regularizer)
A Negative Case Analysis of Visual Grounding Methods for VQA - ACL 2020
Robik Shrestha, Kushal Kafle, Christopher Kanan
GradSup
Learning what makes a difference from counterfactual examples and gradient supervision - ECCV 2020
Damien Teney, Ehsan Abbasnedjad, Anton van den Hengel
VGQE
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder - ECCV 2020
Gouthaman KV, Anurag Mittal
CSS
Counterfactual Samples Synthesizing for Robust Visual Question Answering - CVPR 2020
Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, Yueting Zhuang
Semantic
Estimating semantic structure for the VQA answer space - Preprint 2020
Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf
Unshuffling
Unshuffling Data for Improved Generalization - Preprint 2020
Damien Teney, Ehsan Abbasnejad, Anton van den Hengel
Summary Inspired by Invariant Risk Minimization (Arjovskyet al.). They make use of two training sets with different biases to learn a more robust classifier (that will perform better on OOD data).
CF-VQA
Counterfactual VQA: A Cause-Effect Look at Language Bias - Preprint 2020
Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, Ji-Rong Wen
Summary

They formalize the ensembling framwork from RUBi and LMH using the causality framework.

LMH
Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases - EMNLP 2019
Christopher Clark, Mark Yatskar, Luke Zettlemoyer
RUBi
RUBi: Reducing Unimodal Biases in Visual Question Answering - NeurIPS 2019
Remi Cadene, Corentin Dancette, Hedi Ben-younes, Matthieu Cord, Devi Parikh
Summary

During training : Ensembling with a question-only model that will learn the biases, and let the main VQA model learn useful behaviours.

During testing: We remove the question-only model, and keep only the VQA model.

NSM
Learning by Abstraction: The Neural State Machine
Drew A. Hudson, Christopher D. Manning
SCR
Self-Critical Reasoning for Robust Visual Question Answering - NeurIPS 2019
Jialin Wu, Raymond J. Mooney
HINT
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded - ICCV 2019
Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh
ActSeek
Actively Seeking and Learning from Live Data - CVPR 2019
Damien Teney, Anton van den Hengel
GRL
Adversarial Regularization for Visual Question Answering:Strengths, Shortcomings, and Side Effects - **NAACL HLT - Workshop on Shortcomings in Vision and Language (SiVL) **
Gabriel Grand, Yonatan Belinkov
AdvReg
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization - NeurIPS 2018
Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee
code:
GVQA
Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering - CVPR 2018
Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi