CLIP App

This app perform zero-shot image classification using OpenAI's CLIP hosted by Hugging Face.

Intended Model Use

Please read the Model Use section and evaluate if your use case fits before deploying this model.

Usage

# install dependencies
pip3 install --upgrade -r requirements.txt

# run app with some sample text
python3 main.py \
    "a person drinking coffee" \
    "a person making a call" \
    "a person jogging" \
    "a construction crew fixing the road" \
    "a red sports car" \
    "a busy intersection"

This will open a camera stream and print the similarity and softmax scores of each text desciption for each frame.

12.499 0.000   a red sports car 
16.413 0.001   a busy intersection 
17.943 0.006   a construction crew fixing the road 
20.251 0.065   a person jogging 
21.546 0.237   a person making a call 
22.612 0.690   a person drinking coffee 

12.850 0.000   a red sports car 
16.526 0.002   a busy intersection 
17.970 0.007   a construction crew fixing the road 
20.424 0.076   a person jogging 
21.518 0.226   a person making a call 
22.633 0.690   a person drinking coffee 
...

Note: I'm still working out the range of similarity scores. They do not seem to be [-1, 1] or [-100, 100] as I expected.

The ease of which its possible to deploy is... interesting but very concerning...

Additional Tools

The tools directory contains additional tools for experimenting with labels on images and video.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CLIP App

Intended Model Use

Usage

Additional Tools

Files

README.md

Latest commit

History

README.md

File metadata and controls

CLIP App

Intended Model Use

Usage

Additional Tools