Changmeng Zheng1, Dayong Liang2, Wengyu Zhang1, Xiao-Yong Wei*,1, Tat-Seng Chua3, Qing Li1
1The Hong Kong Polytechnic 2South China University of Technology 3National University of Singapore
*Corresponding author
Blueprint Debate-on-Graph (BDoG)
🔥 [2024.10] Our paper has been nominated as the best paper award!
🔥 [2024.07] The paper and Code are released!
git clone https://github.com/thecharm/BDoG.git
cd BDoG
pip install -e .
Download the model weights and set the model path in the BDoG/vlmeval/config.py
file
torchrun --nproc_per_node=1 run.py --data ScienceQA_TEST \
--stage BDebate \
--debate 2
--data
- Dataset supported:
ScienceQA_TEST
andMMBench_DEV_EN
.
- Dataset supported:
--stage
- Prompt Type:
BDebate
(Blueprint Debate on Graph) orODebate
(Debate without Graph).
- Prompt Type:
--debate
- Number of rounds for the debate.
--kg_init
- (optional) Use Gemini Graph as the initialization for multi-round debates.
--nproc_per_node=2
- (optional) Speed up the inference process if you have two GPUs.
--openai
- (optional) Use the Openai API key to perform the final result validation.
The results are saved in the BDoG/results/instructblip_13b
folder.
During this process, the datasets will be automatically downloaded to the /root/LMUData/
directory. If you need to change the data storage path, please reset --lmudata
.
- VLMEvalKit: An open-source evaluation toolkit of large vision-language models (LVLMs).
- LLaVA: Wounderful MLLM based on Large Language and Vision Assistant.
- LAVIS: The amazing open-sourced multimodality learning codebase.
If this repo is useful to you, please cite using this BibTeX.
@inproceedings{zheng2024picture,
title={A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning},
author={Zheng, Changmeng and Liang, DaYong and Zhang, Wengyu and Wei, Xiaoyong and Chua, Tat-Seng and Li, Qing},
booktitle={ACM Multimedia 2024}
}