Text-To-Image
Some result and configuration on my file google colab https://colab.research.google.com/drive/1fStmlSZh8FFTFBAuA_7KVObK6E4jnIkn?usp=sharing
Microsoft COCO
│MSCOCO_Caption/
├──annotations/
│ ├── captions_train2014.json
│ ├── captions_val2014.json
├──train2014/
│ ├── train2014/
│ │ ├── COCO_train2014_000000000009.jpg
│ │ ├── ......
├──val2014/
│ ├── val2014/
│ │ ├── COCO_val2014_000000000042.jpg
│ │ ├── ......
CUB-200
│CUB-200/
├──images/
│ ├── 001.Black_footed_Albatross/
│ ├── 002.Laysan_Albatross
│ ├── ......
├──text/
│ ├── text/
│ │ ├── 001.Black_footed_Albatross/
│ │ ├── 002.Laysan_Albatross
│ │ ├── ......
├──train/
│ ├── filenames.pickle
├──test/
│ ├── filenames.pickle
Some Pretrain model support for train model
- with coco dataset:
Openimages: https://drive.google.com/uc?id=1HAl78FcWlZdzqj8CHrA8DNf14XSzvNy6
CLIP: https://drive.google.com/uc?id=1qhdlE0l8hkKsakqfHotQueXuTx3sVbPV
my file weight after train on COCO Dataset : https://drive.google.com/file/d/1M9gqLWkt3Z2uN95vx7OSEETSFv1EBhkp/view?usp=sharing after download 2 file weight above, set path link to 2 file pretrain Openimages, CLIP in file coco.yaml - with FFHQ dataset:
on FFHQ Dataset: https://drive.google.com/file/d/1MW3N77SEIlcELJs855YKHu0Xive7AXso/view?usp=sharing my file weight after train model on LAION-Human Dataset : https://drive.google.com/file/d/12py7Zx3JdMjcoRQRx3bxqA-NkMGLM55o/view?usp=sharing
# Training First, change the data_root to correct path in configs/coco.yaml or other configs
Train Text2Image generation on MSCOCO dataset:
python running_command/run_train_coco.py
Train Text2Image generation on CUB200 dataset:
python running_command/run_train_cub.py
# Inference from inference_VQ_Diffusion import VQ_Diffusion < br /> VQ_Diffusion_model = VQ_Diffusion(config='OUTPUT/pretrained_model/config_text.yaml', path='OUTPUT/pretrained_model/coco_pretrained.pth')
VQ_Diffusion_model.inference_generate_sample_with_condition("a beautiful woman",truncation_rate=0.85, save_root="RESULT",batch_size=4)
VQ_Diffusion_model.inference_generate_sample_with_condition("a young girl in blue skirt",truncation_rate=0.85, save_root="RESULT",batch_size=4,fast=2) # for fast inference
# cite VQ-Diffusion
article{gu2021vector,
title={Vector Quantized Diffusion Model for Text-to-Image Synthesis},
author={Gu, Shuyang and Chen, Dong and Bao, Jianmin and Wen, Fang and Zhang, Bo and Chen, Dongdong and Yuan, Lu and Guo, Baining},
journal={arXiv preprint arXiv:2111.14822},
year={2021}
}