Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

Paper (ArXiv) | Project Page | Pre-trained Models

Shuquan Ye²,Yujia Xie¹,Dongdong Chen¹, Yichong Xu¹, Lu Yuan¹, Chenguang Zhu¹, Jing Liao²

¹Microsoft, ²City University of Hong Kong

This is the PyTorch code of the DANCE [paper]. The code is on PyTorch 1.11. Pre-training with ours code requires 4 nodes each with 8 A100 GPUs.

Catalog:

Code for DANCE-augmented Pre-training
Code for DANCE-augmented Fine-tuning
Code for Image-Text Retrieval, OK-VQA
Download of Pre-trained and Fine-tuned Checkpoints

BibTeX