Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, Xiangyu Zhao, Kai Chen
- [2024/06] Our paper Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language is released.
Auto Cherry Picker is a innovative framework designed to synthesize training samples for both perception and multi-modal reasoning tasks from a simple object list in natural language. It employs a nowly designed metric, CLIS, to ensure the quality of the synthetic data.
Method | Backbone | ||
---|---|---|---|
Mask R-CNN | ResNet-50 | 9.3 | 21.7 |
Mask R-CNN w. ACP | ResNet-50 | 14.5(+5.2) | 22.8(+1.1) |
CenterNet2 w. Copy-Paste | Swin-B | 29.3 | 39.3 |
CenterNet2 w. ACP | Swin-B | 30.7(+1.4) | 39.6(+0.3) |
Dataset | Method | Backbone | ||
---|---|---|---|---|
LVIS | Grounding-DINO | Swin-T | 31.7 | 48.7 |
LVIS | Grounding-DINO w. ACP | Swin-T | 33.0(+1.3) | 49.2 |
COCO | Grounding-DINO | Swin-T | 60.4 | 57.1 |
COCO | Grounding-DINO w. ACP | Swin-T | 60.8(+0.4) | 56.9 |
Method | LLM Backbone | MME | GQA |
---|---|---|---|
LLaVA-1.5 | Vicuna-7B | 1434.4 | 58.9 |
LLaVA-1.5 | Vicuna-13B | 1438.3 | 60.7 |
LLaVA-1.5 | LLama-3-8B | 1445.3 | 60.1 |
LLaVA-1.5 w. ACP | Vicuna-7B | 1514.5(+80.1) | 59.3(+0.4) |
Python 3.10
Pytorch 2.3.0
pip install -r requirements.txt
Download Qwen1.5-14B-Chat
git clone https://huggingface.co/Qwen/Qwen1.5-14B-Chat
You can try other LLMs as Scene Graph Generator, and add it in the config/model_config.json
.
- Step 1: Download InstanceDiffusion
git clone https://github.com/frank-xwang/InstanceDiffusion.git
- Step 2: Download model weights
Please download the pretrained InstanceDiffusion from Hugging Face or Google Drive and SD1.5, place them under InstanceDiffusion/pretrained
folder.
Then, create a soft link under ACP
folder.
ln -s InstanceDiffusion/pretrained ./pretrained
- Step 3: Download CLIP
git clone https://huggingface.co/openai/clip-vit-large-patch14
- Step 4: Download SDXL Refiner (Optional)
git clone https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0
To disable SDXL, you can set args.cascade_strength
at infer_image.py
to 0
.
Please download Qwen-VL-Chat
git clone https://huggingface.co/Qwen/Qwen-VL-Chat
Please construct example pool for CLIS-L.
Download sim_map.json and relations_one_to_one.json under config/eval/
Download SAM model weights at Github
mkdir sam
cd sam
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
python inference_single_data.py
You can custom object list at inputs/demo.json
. The generated images are under images/
and the synthesis training sample is under syn_data/
.