Skip to content

Latest commit

 

History

History
28 lines (15 loc) · 1.13 KB

README.md

File metadata and controls

28 lines (15 loc) · 1.13 KB

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

Shuquan Ye2,Yujia Xie1,Dongdong Chen1, Yichong Xu1, Lu Yuan1, Chenguang Zhu1, Jing Liao2

1Microsoft, 2City University of Hong Kong

This is the PyTorch code of the DANCE [paper]. The code is on PyTorch 1.11. Pre-training with ours code requires 4 nodes each with 8 A100 GPUs.

Catalog:

  • Code for DANCE-augmented Pre-training

  • Code for DANCE-augmented Fine-tuning

  • Code for Image-Text Retrieval, OK-VQA

  • Download of Pre-trained and Fine-tuned Checkpoints

BibTeX