Skip to content

Latest commit

 

History

History
119 lines (87 loc) · 4.06 KB

README.md

File metadata and controls

119 lines (87 loc) · 4.06 KB

hf hf arXiv

Installation

Clone this repository and install packages:

git clone https://github.com/LaVi-Lab/TG-Vid.git
cd TG-Vid
conda create --name tg python=3.10
conda activate tg
pip install -r requirement.txt

Data & Weight Preparation

  • Annotation json files for training and testing are provided in Huggingface.

    Please download the corresponding videos from their official websites.

  • Download the pretrained model weights:

    Pretrained Model Weight Download Link
    lmsys/vicuna-7b-v1.1 Huggingface
    EVA ViT-g Link
    QFormer Link
  • Note: you have to modify the path to data & pretrained model weights in the scripts & codes & configs. The easiest way is to search for /path/to/.

Training

For quick usage, we have provided the checkpoints of TG-Vid-197K and TG-Vid-220K as follows:

Model Download Link
TG-Vid-197K Huggingface
TG-Vid-220K Huggingface

If you want to reproduce the training of TG-Vid, you can follow the following scripts:

Note: We use an AWS-like platform (e.g., mmengine.fileio) to store the training videos (stllm/datasets/datasets/image_video_itdatasets.py, has_client = True). If you store the training videos locally, please refer to ST-LLM.

TG-Vid-197K

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash script/train/train.sh TG-Vid-197K

TG-Vid-220K

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash script/train/train.sh TG-Vid-220K

Testing

Check inference/*/test_*.sh for more details:

  model=$1
  gpu=$2
  output="test_output/mvbench/${model}/"
  mkdir -p $output
  ...
  --cfg-path config/$model.yaml \
  --ckpt-path output/${model}/pytorch_model.bin \
  ...

Note: you have to modify the path to annotation files & videos in the scripts & codes. The easiest way is to search for /path/to/.

Take model=TG-Vid-197K as an example:

MVBench

bash script/inference/mvbench/test_mvbench.sh TG-Vid-197K 0

TempCompass

bash script/inference/tempcompass/test_tempcompass.sh TG-Vid-197K 0

NextQA Val

bash script/inference/nextqa/test_nextqa.sh TG-Vid-197K 0

NextQA ATP-Hard

bash script/inference/nextqa_atp_hard/test_nextqa_atp_hard.sh TG-Vid-197K 0

Citation

If you find this repo useful, please consider citing our paper:

@article{hu2024tgvid,
  title={Enhancing Temporal Modeling of Video LLMs via Time Gating},
  author={Zi-Yuan Hu, Yiwu Zhong, Shijia Huang, Michael R. Lyu and Liwei Wang},
  journal={arXiv preprint arXiv:2410.05714},
  year={2024}
}

Acknowledgement