Skip to content

Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.

License

Notifications You must be signed in to change notification settings

bhimrazy/chat-with-phi-3-vision

Repository files navigation


Chat with Phi 3.5/3 Vision LLMs

Open In Studio
phi-3.5-vision-demo.mp4

Overview

Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.

This model enables multi-frame image understanding, image comparison, multi-image summarization/storytelling, and video summarization, which have broad applications in office scenarios.

Getting Started

Follow these steps to set up and run the project:

1. Install Dependencies

i. Download and Install NVIDIA CUDA
Visit the NVIDIA CUDA Toolkit Downloads page and follow the instructions to install CUDA compatible with your system.

ii. Install Required Python Packages
Ensure you have all the necessary dependencies installed by running the following commands:

pip install -r requirements.txt  
pip install flash_attn  

If you encounter any issues while installing flash_attn, refer to the FlashAttention Installation Guide for troubleshooting tips and additional setup details.

2. Start the API Server

Launch the API server powered by LitServe:

python server.py

3. Launch the Streamlit App

Start the Streamlit application with the following command:

streamlit run app.py

About

This project is developed and maintained with ❤️ by Bhimraj Yadav.

About

Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published