convert Chinese clip model #100

aiportor · 2024-09-13T05:31:00Z

convert Chinese clip model fail.
https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16/tree/main

yysu-888 · 2024-09-25T04:48:17Z

convert Chinese clip model fail. https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16/tree/main

welcome to my project,support all openai clip and ofa-sys chinese clip model
https://github.com/yysu-888/clip.cpp

aiportor · 2024-09-26T11:03:42Z

convert Chinese clip model fail. https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16/tree/main

welcome to my project,support all openai clip and ofa-sys chinese clip model https://github.com/yysu-888/clip.cpp

it seems the result is not correct in win10

clip.exe -m clip_f32.gguf --model_version ofasys_chinese_clip_vit_large_patch14_336 --mode text_search_image --model_type q8_0 --img_directory images --text "男孩照片"
System Info:
BLAS = 0
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0

vocab size: 49408
trigger word img already in vocab
clip_vision params backend buffer size = 311.44 MB(RAM) (392 tensors)
clip_text params backend buffer size = 105.07 MB(RAM) (198 tensors)
clip_vision alloc params backend buffer failed, num_tensors = 392
clip_text alloc params backend buffer failed, num_tensors = 198
img_path:images/apple.jpeg,width=640,height=640,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5255 ms
img_path:images/apple_0.jpeg,width=800,height=800,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5049 ms
img_path:images/boy.jpeg,width=194,height=183,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5273 ms
img_path:images/cat.jpeg,width=640,height=427,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5236 ms
img_path:images/cat_0.jpeg,width=658,height=1170,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5238 ms
img_path:images/dog.jpg,width=1076,height=809,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5188 ms
img_path:images/orange.jpg,width=800,height=800,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5135 ms
img_path:images/outdoor.jpeg,width=608,height=810,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5231 ms
img_path:images/panda.jpg,width=690,height=462,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5199 ms
img_path:images/pear.jpeg,width=260,height=194,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5251 ms
img_path:images/sky.jpg,width=690,height=920,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5161 ms
img_path:images/watermelon.jpg,width=690,height=690,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5209 ms
img_path:images/watermelon_0.jpeg,width=420,height=420,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5313 ms
clip_text compute buffer size: 31.50 MB(RAM)
clip_text_modelrunner consuming time= 1530 ms
label:男孩照片
images/sky.jpg :0.295755
images/dog.jpg :0.282417
images/panda.jpg :0.279728

yysu-888 · 2024-09-26T13:03:12Z

convert Chinese clip model fail. https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16/tree/main

welcome to my project,support all openai clip and ofa-sys chinese clip model https://github.com/yysu-888/clip.cpp

it seems the result is not correct in win10

clip.exe -m clip_f32.gguf --model_version ofasys_chinese_clip_vit_large_patch14_336 --mode text_search_image --model_type q8_0 --img_directory images --text "男孩照片" System Info: BLAS = 0 SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0

vocab size: 49408 trigger word img already in vocab clip_vision params backend buffer size = 311.44 MB(RAM) (392 tensors) clip_text params backend buffer size = 105.07 MB(RAM) (198 tensors) clip_vision alloc params backend buffer failed, num_tensors = 392 clip_text alloc params backend buffer failed, num_tensors = 198 img_path:images/apple.jpeg,width=640,height=640,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5255 ms img_path:images/apple_0.jpeg,width=800,height=800,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5049 ms img_path:images/boy.jpeg,width=194,height=183,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5273 ms img_path:images/cat.jpeg,width=640,height=427,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5236 ms img_path:images/cat_0.jpeg,width=658,height=1170,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5238 ms img_path:images/dog.jpg,width=1076,height=809,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5188 ms img_path:images/orange.jpg,width=800,height=800,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5135 ms img_path:images/outdoor.jpeg,width=608,height=810,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5231 ms img_path:images/panda.jpg,width=690,height=462,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5199 ms img_path:images/pear.jpeg,width=260,height=194,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5251 ms img_path:images/sky.jpg,width=690,height=920,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5161 ms img_path:images/watermelon.jpg,width=690,height=690,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5209 ms img_path:images/watermelon_0.jpeg,width=420,height=420,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5313 ms clip_text compute buffer size: 31.50 MB(RAM) clip_text_modelrunner consuming time= 1530 ms label:男孩照片 images/sky.jpg :0.295755 images/dog.jpg :0.282417 images/panda.jpg :0.279728

window platform not test, on my mac ,test as follows:

you can set --model_type f32，try again

aiportor · 2024-09-29T02:53:17Z

convert Chinese clip model fail. https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16/tree/main

welcome to my project,support all openai clip and ofa-sys chinese clip model https://github.com/yysu-888/clip.cpp

it seems the result is not correct in win10
clip.exe -m clip_f32.gguf --model_version ofasys_chinese_clip_vit_large_patch14_336 --mode text_search_image --model_type q8_0 --img_directory images --text "男孩照片" System Info: BLAS = 0 SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0
vocab size: 49408 trigger word img already in vocab clip_vision params backend buffer size = 311.44 MB(RAM) (392 tensors) clip_text params backend buffer size = 105.07 MB(RAM) (198 tensors) clip_vision alloc params backend buffer failed, num_tensors = 392 clip_text alloc params backend buffer failed, num_tensors = 198 img_path:images/apple.jpeg,width=640,height=640,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5255 ms img_path:images/apple_0.jpeg,width=800,height=800,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5049 ms img_path:images/boy.jpeg,width=194,height=183,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5273 ms img_path:images/cat.jpeg,width=640,height=427,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5236 ms img_path:images/cat_0.jpeg,width=658,height=1170,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5238 ms img_path:images/dog.jpg,width=1076,height=809,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5188 ms img_path:images/orange.jpg,width=800,height=800,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5135 ms img_path:images/outdoor.jpeg,width=608,height=810,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5231 ms img_path:images/panda.jpg,width=690,height=462,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5199 ms img_path:images/pear.jpeg,width=260,height=194,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5251 ms img_path:images/sky.jpg,width=690,height=920,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5161 ms img_path:images/watermelon.jpg,width=690,height=690,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5209 ms img_path:images/watermelon_0.jpeg,width=420,height=420,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5313 ms clip_text compute buffer size: 31.50 MB(RAM) clip_text_modelrunner consuming time= 1530 ms label:男孩照片 images/sky.jpg :0.295755 images/dog.jpg :0.282417 images/panda.jpg :0.279728

window platform not test, on my mac ,test as follows: you can set --model_type f32，try again

try with --model_type f32 in win10, same result

and i test in win10 platform and linux platform with the same model file， the result is correct in linux, wrong in win10，i store the features, they are total different in win10 and linux
win10:
clip.exe -m clip_f32.gguf --model_version ofasys_chinese_clip_vit_large_patch14_336 --mode text_search_image --model_type q8_0 --img_directory images --text "男孩照片"
System Info:
BLAS = 0
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0

vocab size: 49408
trigger word img already in vocab
clip_vision params backend buffer size = 311.44 MB(RAM) (392 tensors)
clip_text params backend buffer size = 105.07 MB(RAM) (198 tensors)
clip_vision alloc params backend buffer failed, num_tensors = 392
clip_text alloc params backend buffer failed, num_tensors = 198
img_path:images/apple.jpeg,width=640,height=640,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 4978 ms
img_path:images/apple_0.jpeg,width=800,height=800,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5142 ms
img_path:images/boy.jpeg,width=194,height=183,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5096 ms
img_path:images/cat.jpeg,width=640,height=427,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 4957 ms
img_path:images/cat_0.jpeg,width=658,height=1170,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5077 ms
img_path:images/dog.jpg,width=1076,height=809,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 4921 ms
img_path:images/orange.jpg,width=800,height=800,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5071 ms
img_path:images/outdoor.jpeg,width=608,height=810,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 4963 ms
img_path:images/panda.jpg,width=690,height=462,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 4945 ms
img_path:images/pear.jpeg,width=260,height=194,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 4952 ms
img_path:images/sky.jpg,width=690,height=920,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5047 ms
img_path:images/watermelon.jpg,width=690,height=690,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5052 ms
img_path:images/watermelon_0.jpeg,width=420,height=420,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5073 ms
clip_text compute buffer size: 31.50 MB(RAM)
clip_text_modelrunner consuming time= 1441 ms
label:男孩照片
images/sky.jpg :0.295755
images/dog.jpg :0.282417
images/panda.jpg :0.279728

linux:
./clip -m clip_f32.gguf --model_version ofasys_chinese_clip_vit_large_patch14_336 --mode text_search_image --model_type q8_0 --img_directory images --text "男孩照片"
System Info:
BLAS = 0
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0

vocab size: 49408
trigger word img already in vocab
clip_vision params backend buffer size = 311.44 MB(RAM) (392 tensors)
clip_text params backend buffer size = 105.07 MB(RAM) (198 tensors)
clip_vision alloc params backend buffer failed, num_tensors = 392
clip_text alloc params backend buffer failed, num_tensors = 198
img_path:images/apple.jpeg,width=640,height=640,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/apple_0.jpeg,width=800,height=800,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/boy.jpeg,width=194,height=183,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/cat.jpeg,width=640,height=427,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time=12288 ms
img_path:images/cat_0.jpeg,width=658,height=1170,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/dog.jpg,width=1076,height=809,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/orange.jpg,width=800,height=800,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time=12288 ms
img_path:images/outdoor.jpeg,width=608,height=810,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/panda.jpg,width=690,height=462,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/pear.jpeg,width=260,height=194,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/sky.jpg,width=690,height=920,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time=12288 ms
img_path:images/watermelon.jpg,width=690,height=690,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/watermelon_0.jpeg,width=420,height=420,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
clip_text compute buffer size: 31.50 MB(RAM)
clip_text_modelrunner consuming time= 4096 ms
label:男孩照片
images/boy.jpeg :0.274833
images/cat.jpeg :0.254993
images/panda.jpg :0.246354
apple_0.jpeg_image_feature_linux.txt
apple_0.jpeg_image_feature_win.txt
text_feature_linux.txt
text_feature_win.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert Chinese clip model #100

convert Chinese clip model #100

aiportor commented Sep 13, 2024

yysu-888 commented Sep 25, 2024

aiportor commented Sep 26, 2024

yysu-888 commented Sep 26, 2024 •

edited

Loading

aiportor commented Sep 29, 2024

convert Chinese clip model #100

convert Chinese clip model #100

Comments

aiportor commented Sep 13, 2024

yysu-888 commented Sep 25, 2024

aiportor commented Sep 26, 2024

yysu-888 commented Sep 26, 2024 • edited Loading

aiportor commented Sep 29, 2024

yysu-888 commented Sep 26, 2024 •

edited

Loading