Skip to content

Latest commit

 

History

History
50 lines (37 loc) · 1.62 KB

README.md

File metadata and controls

50 lines (37 loc) · 1.62 KB

CLIP App

This app perform zero-shot image classification using OpenAI's CLIP hosted by Hugging Face.

Intended Model Use

Please read the Model Use section and evaluate if your use case fits before deploying this model.

Usage

# install dependencies
pip3 install --upgrade -r requirements.txt

# run app with some sample text
python3 main.py \
    "a person drinking coffee" \
    "a person making a call" \
    "a person jogging" \
    "a construction crew fixing the road" \
    "a red sports car" \
    "a busy intersection"

This will open a camera stream and print the similarity and softmax scores of each text desciption for each frame.

12.499 0.000   a red sports car 
16.413 0.001   a busy intersection 
17.943 0.006   a construction crew fixing the road 
20.251 0.065   a person jogging 
21.546 0.237   a person making a call 
22.612 0.690   a person drinking coffee 

12.850 0.000   a red sports car 
16.526 0.002   a busy intersection 
17.970 0.007   a construction crew fixing the road 
20.424 0.076   a person jogging 
21.518 0.226   a person making a call 
22.633 0.690   a person drinking coffee 
...

Note: I'm still working out the range of similarity scores. They do not seem to be [-1, 1] or [-100, 100] as I expected.

The ease of which its possible to deploy is... interesting but very concerning...

Additional Tools

The tools directory contains additional tools for experimenting with labels on images and video.