Enhancing Temporal Modeling of Video LLMs via Time Gating

Installation

Clone this repository and install packages:

git clone https://github.com/LaVi-Lab/TG-Vid.git
cd TG-Vid
conda create --name tg python=3.10
conda activate tg
pip install -r requirement.txt

Data & Weight Preparation

Annotation json files for training and testing are provided in Huggingface.

Please download the corresponding videos from their official websites.
Download the pretrained model weights:

Pretrained Model Weight Download Link

lmsys/vicuna-7b-v1.1 Huggingface

EVA ViT-g Link

QFormer Link
Note: you have to modify the path to data & pretrained model weights in the scripts & codes & configs. The easiest way is to search for /path/to/.

Training

For quick usage, we have provided the checkpoints of TG-Vid-197K and TG-Vid-220K as follows:

Model	Download Link
TG-Vid-197K	Huggingface
TG-Vid-220K	Huggingface

If you want to reproduce the training of TG-Vid, you can follow the following scripts:

Note: We use an AWS-like platform (e.g., mmengine.fileio) to store the training videos (stllm/datasets/datasets/image_video_itdatasets.py, has_client = True). If you store the training videos locally, please refer to ST-LLM.

TG-Vid-197K

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash script/train/train.sh TG-Vid-197K

TG-Vid-220K

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash script/train/train.sh TG-Vid-220K

Testing

Check inference/*/test_*.sh for more details:

  model=$1
  gpu=$2
  output="test_output/mvbench/${model}/"
  mkdir -p $output
  ...
  --cfg-path config/$model.yaml \
  --ckpt-path output/${model}/pytorch_model.bin \
  ...

Note: you have to modify the path to annotation files & videos in the scripts & codes. The easiest way is to search for /path/to/.

Take model=TG-Vid-197K as an example:

MVBench

bash script/inference/mvbench/test_mvbench.sh TG-Vid-197K 0

TempCompass

bash script/inference/tempcompass/test_tempcompass.sh TG-Vid-197K 0

NextQA Val

bash script/inference/nextqa/test_nextqa.sh TG-Vid-197K 0

NextQA ATP-Hard

bash script/inference/nextqa_atp_hard/test_nextqa_atp_hard.sh TG-Vid-197K 0

Citation

If you find this repo useful, please consider citing our paper:

@article{hu2024tgvid,
  title={Enhancing Temporal Modeling of Video LLMs via Time Gating},
  author={Zi-Yuan Hu, Yiwu Zhong, Shijia Huang, Michael R. Lyu and Liwei Wang},
  journal={arXiv preprint arXiv:2410.05714},
  year={2024}
}

Acknowledgement

MVBench and ST-LLM: The codebase we built upon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Enhancing Temporal Modeling of Video LLMs via Time Gating

Installation

Data & Weight Preparation

Training

TG-Vid-197K

TG-Vid-220K

Testing

MVBench

TempCompass

NextQA Val

NextQA ATP-Hard

Citation

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

Enhancing Temporal Modeling of Video LLMs via Time Gating

Installation

Data & Weight Preparation

Training

TG-Vid-197K

TG-Vid-220K

Testing

MVBench

TempCompass

NextQA Val

NextQA ATP-Hard

Citation

Acknowledgement