Skip to content

Latest commit

 

History

History
57 lines (43 loc) · 2.33 KB

contrastive_language_image_pretrain.md

File metadata and controls

57 lines (43 loc) · 2.33 KB

Contrastive Language Image Pretrain

任务描述

语言图像对比预训练:对模型进行图文对比学习,增强模型对文本图片的匹配度认识能力,预训练完的模型可用于零样本图像分类等下游任务

相关论文 Alec Radford, Jong Wook Kim, et al., Learning Transferable Visual Models From Natural Language Supervision, 2021.

已支持数据集性能

model type Datasets Performance stage example
clip clip_vit_b_32
clip_vit_b_16
clip_vit_l_14
clip_vit_l_14@336
Flickr8k -- pretrain link

Flickr8k(链接,密码: s4be)

  • 数据集大小:2.2G,共8000张彩色图像,每张图像都与五个不同的标题配对,这些标题提供了对图片中物体和事件的内容描述
    • 训练集:6000张图像
    • 验证集:1000张图像
    • 测试集:1000张图像
  • 数据格式:RGB
数据集目录格式
└─Flickr8k
   ├─Flickr8k_Dataset
   |      └─Flickr8k_Dataset
   └─Flickr8k_text
          ├─Flickr8k.devImages.txt
          ├─Flickr8k.testImages.txt
          ├─Flickr8k.trainImages.txt
          └─Flickr8k.token.txt

快速任务接口

  • Trainer接口开启训练:
import mindspore; mindspore.set_context(mode=0, device_id=0)
from mindformers import MindFormerBook
from mindformers.trainer import Trainer

# 显示Trainer的模型支持列表
MindFormerBook.show_trainer_support_model_list("contrastive_language_image_pretrain")
# INFO - Trainer support model list for contrastive_language_image_pretrain task is:
# INFO -    ['clip_vit_b_32', 'clip_vit_b_16', 'clip_vit_l_14', 'clip_vit_l_14@336']
# INFO - -------------------------------------

# 初始化trainer
trainer = Trainer(task='contrastive_language_image_pretrain',
    model='clip_vit_b_32',
    train_dataset='./Flickr8k'
)

trainer.train()