Index and search images based on descriptions generated by a local multimodal LLM.
This application makes a directory of images searchable with text queries. It does this by using a local multimodal LLM (e.g., llama3.2-vision) via the ollama API to generate descriptions of images, which it then writes to a semantic database (chromadb).
The text embeddings used by chromadb allow for querying the images with text prompts.
- ollama
- a local multimodal model supported by ollama (e.g., llama3.2-vision)
- A running ollama service
To index data, run main.py
python main.py --directory /path/to/images
To query the index, use --query
python main.py --query "buoy"