Skip to content

Commit

Permalink
Release code
Browse files Browse the repository at this point in the history
  • Loading branch information
khanrc committed Mar 27, 2023
1 parent ceb5b28 commit e643d68
Show file tree
Hide file tree
Showing 63 changed files with 5,631 additions and 12 deletions.
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
*.pyc
.vscode
__pycache__
output
.ipynb_checkpoints
notebooks
tcp-checker
checkpoints/
data/
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Kakao Brain Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
201 changes: 189 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,217 @@
# TCL: Text-grounded Contrastive Learning for Unsupervised Open-world Semantic Segmentation
# TCL: Text-grounded Contrastive Learning (CVPR'23)

[**Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs**](https://arxiv.org/abs/2212.00785)
Official PyTorch implementation of [**Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs**](https://arxiv.org/abs/2212.00785), *Junbum Cha, Jonghwan Mun, Byungseok Roh*, CVPR 2023.

Junbum Cha, Jonghwan Mun, and Byungseok Roh.
**T**ext-grounded **C**ontrastive **L**earning (TCL) is an open-world semantic segmentation framework using only image-text pairs. TCL enables a model to learn region-text alignment without train-test discrepancy.

The code will be released soon.
We will release a demo soon.

<!-- <div align="center"> -->
<!-- <figure> -->
<!-- <img alt="" src="./assets/radar_chart.jpg" width="480"> -->
<!-- </figure> -->
<!-- </div> -->
<div align="center">
<figure>
<img alt="" src="./assets/radar_chart.jpg" width="480">
<img alt="" src="./assets/method.jpg">
</figure>
</div>


## Visual examples
## Results

- Qualitative examples in PASCAL VOC
TCL can perform segmentation on both (a, c) existing segmentation benchmarks and (b) arbitrary concepts, such as proper nouns and free-form text, in the wild images.

<div align="center">
<figure>
<img alt="" src="./assets/main.jpg">
</figure>
</div>

<br/>

<details>
<summary> Additional examples in PASCAL VOC </summary>
<p align="center">
<img src="./assets/examples-voc.jpg" width="800" />
</p>
</details>

- Qualitative examples in the wild

<details>
<summary> Additional examples in the wild </summary>
<p align="center">
<img src="./assets/examples-in-the-wild.jpg" width="800" />
</p>
</details>


## Dependencies

We used pytorch 1.12.1 and torchvision 0.13.1.

```bash
pip install -U openmim
mim install mmcv-full==1.6.2 mmsegmentation==0.27.0
pip install -r requirements.txt
```

Note that the order of requirements roughly represents the importance of the version.
We recommend using the same version for at least `webdataset`, `mmsegmentation`, and `timm`.


## Datasets

Note that much of this section is adapted from the [data preparation section of GroupViT README](https://github.com/NVlabs/GroupViT#data-preparation).

We use [webdataset](https://webdataset.github.io/webdataset/) as scalable data format in training and [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) for semantic segmentation evaluation.

The overall file structure is as follows:

```shell
TCL
├── data
│ ├── gcc3m
│ │ ├── gcc-train-000000.tar
│ │ ├── ...
│ ├── gcc12m
│ │ ├── cc-000000.tar
│ │ ├── ...
│ ├── cityscapes
│ │ ├── leftImg8bit
│ │ │ ├── train
│ │ │ ├── val
│ │ ├── gtFine
│ │ │ ├── train
│ │ │ ├── val
│ ├── VOCdevkit
│ │ ├── VOC2012
│ │ │ ├── JPEGImages
│ │ │ ├── SegmentationClass
│ │ │ ├── ImageSets
│ │ │ │ ├── Segmentation
│ │ ├── VOC2010
│ │ │ ├── JPEGImages
│ │ │ ├── SegmentationClassContext
│ │ │ ├── ImageSets
│ │ │ │ ├── SegmentationContext
│ │ │ │ │ ├── train.txt
│ │ │ │ │ ├── val.txt
│ │ │ ├── trainval_merged.json
│ │ ├── VOCaug
│ │ │ ├── dataset
│ │ │ │ ├── cls
│ ├── ade
│ │ ├── ADEChallengeData2016
│ │ │ ├── annotations
│ │ │ │ ├── training
│ │ │ │ ├── validation
│ │ │ ├── images
│ │ │ │ ├── training
│ │ │ │ ├── validation
│ ├── coco_stuff164k
│ │ ├── images
│ │ │ ├── train2017
│ │ │ ├── val2017
│ │ ├── annotations
│ │ │ ├── train2017
│ │ │ ├── val2017
```

The instructions for preparing each dataset are as follows.

### Training datasets

In training, we use Conceptual Caption 3m and 12m. We use [img2dataset](https://github.com/rom1504/img2dataset) tool to download and preprocess the datasets.

#### GCC3M

Please download the training split annotation file from [Conceptual Caption 3M](https://ai.google.com/research/ConceptualCaptions/download) and name it as `gcc3m.tsv`.

Then run `img2dataset` to download the image text pairs and save them in the webdataset format.
```
sed -i '1s/^/caption\turl\n/' gcc3m.tsv
img2dataset --url_list gcc3m.tsv --input_format "tsv" \
--url_col "url" --caption_col "caption" --output_format webdataset \
--output_folder data/gcc3m \
--processes_count 16 --thread_count 64 \
--image_size 512 --resize_mode keep_ratio --resize_only_if_bigger True \
--enable_wandb True --save_metadata False --oom_shard_count 6
rename -d 's/^/gcc-train-/' data/gcc3m/*
```
Please refer to [img2dataset CC3M tutorial](https://github.com/rom1504/img2dataset/blob/main/dataset_examples/cc3m.md) for more details.

#### GCC12M

Please download the annotation file from [Conceptual Caption 12M](https://github.com/google-research-datasets/conceptual-12m) and name it as `gcc12m.tsv`.

Then run `img2dataset` to download the image text pairs and save them in the webdataset format.
```
sed -i '1s/^/caption\turl\n/' gcc12m.tsv
img2dataset --url_list gcc12m.tsv --input_format "tsv" \
--url_col "url" --caption_col "caption" --output_format webdataset \
--output_folder data/gcc12m \
--processes_count 16 --thread_count 64 \
--image_size 512 --resize_mode keep_ratio --resize_only_if_bigger True \
--enable_wandb True --save_metadata False --oom_shard_count 6
rename -d 's/^/cc-/' data/gcc12m/*
```
Please refer to [img2dataset CC12M tutorial](https://github.com/rom1504/img2dataset/blob/main/dataset_examples/cc12m.md) for more details.


### Evaluation datasets

In the paper, we use 8 benchmarks; (i) w/ background: PASCAL VOC20, PASCAL Context59, and COCO-Object, and (ii) w/o background: PASCAL VOC, PASCAL Context, COCO-Stuff, Cityscapes, and ADE20k.
Since some benchmarks share the data sources (e.g., VOC20 and VOC), we need to prepare 5 datasets: PASCAL VOC, PASCAL Context, COCO-Stuff164k, Cityscapes, and ADE20k.

Please download and setup [PASCAL VOC](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#pascal-voc), [PASCAL Context](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#pascal-context), [COCO-Stuff164k](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#coco-stuff-164k), [Cityscapes](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#cityscapes), and [ADE20k](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#ade20k) datasets following [MMSegmentation data preparation document](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md).

#### COCO Object

COCO-Object dataset uses only object classes from COCO-Stuff164k dataset by collecting instance semgentation annotations.
Run the following command to convert instance segmentation annotations to semantic segmentation annotations:

```shell
python convert_dataset/convert_coco.py data/coco_stuff164k/ -o data/coco_stuff164k/
```


## Training

We use 16 and 8 NVIDIA V100 GPUs for the main and ablation experiments, respectively.

### Single node

```
torchrun --rdzv_endpoint=localhost:5 --nproc_per_node=auto main.py --cfg ./configs/tcl.yml
```

### Multi node

```
torchrun --rdzv_endpoint=$HOST:$PORT --nproc_per_node=auto --nnodes=$NNODES --node_rank=$RANK main.py --cfg ./configs/tcl.yml
```

## Evaluation

Zero-shot transfer to semantic segmentation:

```
torchrun --rdzv_endpoint=localhost:5 --nproc_per_node=auto main.py --resume checkpoints/tcl.pth --eval
```


## Citation

```bibtex
@article{cha2022tcl,
@inproceedings{cha2022tcl,
title={Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs},
author={Cha, Junbum and Mun, Jonghwan and Roh, Byungseok},
journal={arXiv preprint arXiv:2212.00785},
year={2022}
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2023}
}
```


## License

This project is released under [MIT license](./LICENSE).
Binary file added assets/main.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/method.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
90 changes: 90 additions & 0 deletions configs/default.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
_base_: "eval.yml"

data:
batch_size: 256
pin_memory: true
num_workers: 6
seed: ${train.seed}
dataset:
meta:
gcc3m:
type: img_txt_pair
path: ./data/gcc3m
prefix: gcc-train-{000000..00347}.tar
length: 2881393
gcc12m:
type: img_txt_pair
path: ./data/gcc12m
prefix: cc-{000000..001175}.tar
length: 11286526
train:
- gcc3m
- gcc12m

img_aug:
deit_aug: true
img_size: 224
img_scale: [0.08, 1.0]
interpolation: bilinear
color_jitter: 0.4
auto_augment: 'rand-m9-mstd0.5-inc1'
re_prob: 0.25
re_mode: 'pixel'
re_count: 1
text_aug: null

train:
start_step: 0
total_steps: 50000
warmup_steps: 20000
ust_steps: 0
base_lr: 1.6e-3
weight_decay: 0.05
min_lr: 4e-5
clip_grad: 5.0
fp16: true
fp16_comm: true # use fp16 grad compression for multi-node training
seed: 0

lr_scheduler:
name: cosine

optimizer:
name: adamw
eps: 1e-8
betas: [0.9, 0.999]


evaluate:
pamr: false
kp_w: 0.0
bg_thresh: 0.5

save_logits: null

eval_only: false
eval_freq: 5000
template: simple
task:
- voc
- voc20
- context
- context59
- coco_stuff
- coco_object
- cityscapes
- ade20k


checkpoint:
resume: ''
save_topk: 0
save_all: false # if true, save every evaluation step


model_name: "default" # display name in the logger
output: ???
tag: default
print_freq: 20
seed: 0
wandb: false
30 changes: 30 additions & 0 deletions configs/eval.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
evaluate:
pamr: true
bg_thresh: 0.4
kp_w: 0.3

eval_only: true
template: custom
task:
- voc
- voc20
- context
- context59
- coco_stuff
- coco_object
- cityscapes
- ade20k

# training splits
t_voc20: segmentation/configs/_base_/datasets/t_pascal_voc12_20.py
t_context59: segmentation/configs/_base_/datasets/t_pascal_context59.py

# evaluation
voc: segmentation/configs/_base_/datasets/pascal_voc12.py
voc20: segmentation/configs/_base_/datasets/pascal_voc12_20.py
context: segmentation/configs/_base_/datasets/pascal_context.py
context59: segmentation/configs/_base_/datasets/pascal_context59.py
coco_stuff: segmentation/configs/_base_/datasets/stuff.py
coco_object: segmentation/configs/_base_/datasets/coco.py
cityscapes: segmentation/configs/_base_/datasets/cityscapes.py
ade20k: segmentation/configs/_base_/datasets/ade20k.py
Loading

0 comments on commit e643d68

Please sign in to comment.