Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert Chinese clip model #100

Open
aiportor opened this issue Sep 13, 2024 · 4 comments
Open

convert Chinese clip model #100

aiportor opened this issue Sep 13, 2024 · 4 comments

Comments

@aiportor
Copy link

convert Chinese clip model fail.
https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16/tree/main

@yysu-888
Copy link

convert Chinese clip model fail. https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16/tree/main

welcome to my project,support all openai clip and ofa-sys chinese clip model
https://github.com/yysu-888/clip.cpp

@aiportor
Copy link
Author

convert Chinese clip model fail. https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16/tree/main

welcome to my project,support all openai clip and ofa-sys chinese clip model https://github.com/yysu-888/clip.cpp

it seems the result is not correct in win10

clip.exe -m clip_f32.gguf --model_version ofasys_chinese_clip_vit_large_patch14_336 --mode text_search_image --model_type q8_0 --img_directory images --text "男孩照片"
System Info:
BLAS = 0
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0

vocab size: 49408
trigger word img already in vocab
clip_vision params backend buffer size = 311.44 MB(RAM) (392 tensors)
clip_text params backend buffer size = 105.07 MB(RAM) (198 tensors)
clip_vision alloc params backend buffer failed, num_tensors = 392
clip_text alloc params backend buffer failed, num_tensors = 198
img_path:images/apple.jpeg,width=640,height=640,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5255 ms
img_path:images/apple_0.jpeg,width=800,height=800,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5049 ms
img_path:images/boy.jpeg,width=194,height=183,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5273 ms
img_path:images/cat.jpeg,width=640,height=427,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5236 ms
img_path:images/cat_0.jpeg,width=658,height=1170,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5238 ms
img_path:images/dog.jpg,width=1076,height=809,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5188 ms
img_path:images/orange.jpg,width=800,height=800,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5135 ms
img_path:images/outdoor.jpeg,width=608,height=810,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5231 ms
img_path:images/panda.jpg,width=690,height=462,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5199 ms
img_path:images/pear.jpeg,width=260,height=194,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5251 ms
img_path:images/sky.jpg,width=690,height=920,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5161 ms
img_path:images/watermelon.jpg,width=690,height=690,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5209 ms
img_path:images/watermelon_0.jpeg,width=420,height=420,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5313 ms
clip_text compute buffer size: 31.50 MB(RAM)
clip_text_modelrunner consuming time= 1530 ms
label:男孩照片
images/sky.jpg :0.295755
images/dog.jpg :0.282417
images/panda.jpg :0.279728

@yysu-888
Copy link

yysu-888 commented Sep 26, 2024

convert Chinese clip model fail. https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16/tree/main

welcome to my project,support all openai clip and ofa-sys chinese clip model https://github.com/yysu-888/clip.cpp

it seems the result is not correct in win10

clip.exe -m clip_f32.gguf --model_version ofasys_chinese_clip_vit_large_patch14_336 --mode text_search_image --model_type q8_0 --img_directory images --text "男孩照片" System Info: BLAS = 0 SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0

vocab size: 49408 trigger word img already in vocab clip_vision params backend buffer size = 311.44 MB(RAM) (392 tensors) clip_text params backend buffer size = 105.07 MB(RAM) (198 tensors) clip_vision alloc params backend buffer failed, num_tensors = 392 clip_text alloc params backend buffer failed, num_tensors = 198 img_path:images/apple.jpeg,width=640,height=640,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5255 ms img_path:images/apple_0.jpeg,width=800,height=800,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5049 ms img_path:images/boy.jpeg,width=194,height=183,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5273 ms img_path:images/cat.jpeg,width=640,height=427,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5236 ms img_path:images/cat_0.jpeg,width=658,height=1170,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5238 ms img_path:images/dog.jpg,width=1076,height=809,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5188 ms img_path:images/orange.jpg,width=800,height=800,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5135 ms img_path:images/outdoor.jpeg,width=608,height=810,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5231 ms img_path:images/panda.jpg,width=690,height=462,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5199 ms img_path:images/pear.jpeg,width=260,height=194,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5251 ms img_path:images/sky.jpg,width=690,height=920,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5161 ms img_path:images/watermelon.jpg,width=690,height=690,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5209 ms img_path:images/watermelon_0.jpeg,width=420,height=420,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5313 ms clip_text compute buffer size: 31.50 MB(RAM) clip_text_modelrunner consuming time= 1530 ms label:男孩照片 images/sky.jpg :0.295755 images/dog.jpg :0.282417 images/panda.jpg :0.279728

window platform not test, on my mac ,test as follows:
截屏2024-09-26 21 02 58
you can set --model_type f32,try again

@aiportor
Copy link
Author

convert Chinese clip model fail. https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16/tree/main

welcome to my project,support all openai clip and ofa-sys chinese clip model https://github.com/yysu-888/clip.cpp

it seems the result is not correct in win10
clip.exe -m clip_f32.gguf --model_version ofasys_chinese_clip_vit_large_patch14_336 --mode text_search_image --model_type q8_0 --img_directory images --text "男孩照片" System Info: BLAS = 0 SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0
vocab size: 49408 trigger word img already in vocab clip_vision params backend buffer size = 311.44 MB(RAM) (392 tensors) clip_text params backend buffer size = 105.07 MB(RAM) (198 tensors) clip_vision alloc params backend buffer failed, num_tensors = 392 clip_text alloc params backend buffer failed, num_tensors = 198 img_path:images/apple.jpeg,width=640,height=640,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5255 ms img_path:images/apple_0.jpeg,width=800,height=800,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5049 ms img_path:images/boy.jpeg,width=194,height=183,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5273 ms img_path:images/cat.jpeg,width=640,height=427,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5236 ms img_path:images/cat_0.jpeg,width=658,height=1170,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5238 ms img_path:images/dog.jpg,width=1076,height=809,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5188 ms img_path:images/orange.jpg,width=800,height=800,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5135 ms img_path:images/outdoor.jpeg,width=608,height=810,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5231 ms img_path:images/panda.jpg,width=690,height=462,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5199 ms img_path:images/pear.jpeg,width=260,height=194,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5251 ms img_path:images/sky.jpg,width=690,height=920,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5161 ms img_path:images/watermelon.jpg,width=690,height=690,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5209 ms img_path:images/watermelon_0.jpeg,width=420,height=420,c=3 clip_vision compute buffer size: 31.59 MB(RAM) clip_vision_modelrunner consuming time= 5313 ms clip_text compute buffer size: 31.50 MB(RAM) clip_text_modelrunner consuming time= 1530 ms label:男孩照片 images/sky.jpg :0.295755 images/dog.jpg :0.282417 images/panda.jpg :0.279728

window platform not test, on my mac ,test as follows: 截屏2024-09-26 21 02 58 you can set --model_type f32,try again

try with --model_type f32 in win10, same result

and i test in win10 platform and linux platform with the same model file, the result is correct in linux, wrong in win10,i store the features, they are total different in win10 and linux
win10:
clip.exe -m clip_f32.gguf --model_version ofasys_chinese_clip_vit_large_patch14_336 --mode text_search_image --model_type q8_0 --img_directory images --text "男孩照片"
System Info:
BLAS = 0
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0

vocab size: 49408
trigger word img already in vocab
clip_vision params backend buffer size = 311.44 MB(RAM) (392 tensors)
clip_text params backend buffer size = 105.07 MB(RAM) (198 tensors)
clip_vision alloc params backend buffer failed, num_tensors = 392
clip_text alloc params backend buffer failed, num_tensors = 198
img_path:images/apple.jpeg,width=640,height=640,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 4978 ms
img_path:images/apple_0.jpeg,width=800,height=800,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5142 ms
img_path:images/boy.jpeg,width=194,height=183,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5096 ms
img_path:images/cat.jpeg,width=640,height=427,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 4957 ms
img_path:images/cat_0.jpeg,width=658,height=1170,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5077 ms
img_path:images/dog.jpg,width=1076,height=809,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 4921 ms
img_path:images/orange.jpg,width=800,height=800,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5071 ms
img_path:images/outdoor.jpeg,width=608,height=810,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 4963 ms
img_path:images/panda.jpg,width=690,height=462,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 4945 ms
img_path:images/pear.jpeg,width=260,height=194,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 4952 ms
img_path:images/sky.jpg,width=690,height=920,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5047 ms
img_path:images/watermelon.jpg,width=690,height=690,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5052 ms
img_path:images/watermelon_0.jpeg,width=420,height=420,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 5073 ms
clip_text compute buffer size: 31.50 MB(RAM)
clip_text_modelrunner consuming time= 1441 ms
label:男孩照片
images/sky.jpg :0.295755
images/dog.jpg :0.282417
images/panda.jpg :0.279728

linux:
./clip -m clip_f32.gguf --model_version ofasys_chinese_clip_vit_large_patch14_336 --mode text_search_image --model_type q8_0 --img_directory images --text "男孩照片"
System Info:
BLAS = 0
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0

vocab size: 49408
trigger word img already in vocab
clip_vision params backend buffer size = 311.44 MB(RAM) (392 tensors)
clip_text params backend buffer size = 105.07 MB(RAM) (198 tensors)
clip_vision alloc params backend buffer failed, num_tensors = 392
clip_text alloc params backend buffer failed, num_tensors = 198
img_path:images/apple.jpeg,width=640,height=640,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/apple_0.jpeg,width=800,height=800,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/boy.jpeg,width=194,height=183,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/cat.jpeg,width=640,height=427,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time=12288 ms
img_path:images/cat_0.jpeg,width=658,height=1170,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/dog.jpg,width=1076,height=809,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/orange.jpg,width=800,height=800,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time=12288 ms
img_path:images/outdoor.jpeg,width=608,height=810,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/panda.jpg,width=690,height=462,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/pear.jpeg,width=260,height=194,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/sky.jpg,width=690,height=920,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time=12288 ms
img_path:images/watermelon.jpg,width=690,height=690,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
img_path:images/watermelon_0.jpeg,width=420,height=420,c=3
clip_vision compute buffer size: 31.59 MB(RAM)
clip_vision_modelrunner consuming time= 8192 ms
clip_text compute buffer size: 31.50 MB(RAM)
clip_text_modelrunner consuming time= 4096 ms
label:男孩照片
images/boy.jpeg :0.274833
images/cat.jpeg :0.254993
images/panda.jpg :0.246354
apple_0.jpeg_image_feature_linux.txt
apple_0.jpeg_image_feature_win.txt
text_feature_linux.txt
text_feature_win.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants