Skip to content
/ BDoG Public
forked from thecharm/BDoG

Code for ACM MM 2024 paper "A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning"

Notifications You must be signed in to change notification settings

YongLD/BDoG

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning

Changmeng Zheng1, Dayong Liang2, Wengyu Zhang1, Xiao-Yong Wei*,1, Tat-Seng Chua3, Qing Li1

1The Hong Kong Polytechnic   2South China University of Technology   3National University of Singapore
*Corresponding author   

arXiv

figure1

Blueprint Debate-on-Graph (BDoG)

🔥News

🔥 [2024.10] Our paper has been nominated as the best paper award!
🔥 [2024.07] The paper and Code are released!

🚀 Method

method

🏗️ QuickStart

1. Installation

git clone https://github.com/thecharm/BDoG.git
cd BDoG
pip install -e .

2. Download model weights

Download the model weights and set the model path in the BDoG/vlmeval/config.py file

3. Running

torchrun --nproc_per_node=1 run.py --data ScienceQA_TEST \
                                   --stage BDebate \
                                   --debate 2
  • --data
    • Dataset supported: ScienceQA_TEST and MMBench_DEV_EN.
  • --stage
    • Prompt Type: BDebate(Blueprint Debate on Graph) or ODebate(Debate without Graph).
  • --debate
    • Number of rounds for the debate.
  • --kg_init
    • (optional) Use Gemini Graph as the initialization for multi-round debates.
  • --nproc_per_node=2
    • (optional) Speed up the inference process if you have two GPUs.
  • --openai
    • (optional) Use the Openai API key to perform the final result validation.

The results are saved in the BDoG/results/instructblip_13b folder.

During this process, the datasets will be automatically downloaded to the /root/LMUData/ directory. If you need to change the data storage path, please reset --lmudata.

❤️ Acknowledgments

  • VLMEvalKit: An open-source evaluation toolkit of large vision-language models (LVLMs).
  • LLaVA: Wounderful MLLM based on Large Language and Vision Assistant.
  • LAVIS: The amazing open-sourced multimodality learning codebase.

📑 Citation

If this repo is useful to you, please cite using this BibTeX.

@inproceedings{zheng2024picture,
  title={A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning},
  author={Zheng, Changmeng and Liang, DaYong and Zhang, Wengyu and Wei, Xiaoyong and Chua, Tat-Seng and Li, Qing},
  booktitle={ACM Multimedia 2024}
}

About

Code for ACM MM 2024 paper "A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%