🔍 Video Benchmark Suite: Rapid Evaluation of Video Foundation Models

🎯 Video Benchmark

📑 SOTA Video Foundation Models

👏 MODEL_ZOO

We provide the sota foundation model weights in MODEL_ZOO.md.

📊 Video Evaluation Method

🌸 Linear Probing Evaluation
Few-shot-data Linear Probing Evaluation
🌻 Full Finetuning Evaluation
Few-shot-data Full Finetuning Evaluation
🍁 Zero-shot Evaluation
Few-shot-data Zero-shot Evaluation
🌵 Attentive Probing Evaluation
Few-shot-data Attentive Probing Evaluation

This image is sourced from the paper Unifying Video Self-Supervised Learning across Families of Tasks: A Survey.
Attentive Probing Evaluation is sourced from the paper Context Autoencoder for Self-Supervised Representation Learning.

Ours Video Evaluation Details

Video Action Recognition

We employ Linear Probing and Attentive Probing method for rapid evaluation of Action Recognition.

Abbreviations used: pt (pretrain), ppt (post-pretrain), ft (finetune).

💾 Evaluation Dataset

🌸 Linear Probing Evaluation

Hyperparameter	Info
optimizer	LBFGS
regularization	L2
max_iter	1000
scan Learning rate	10^-6, 10^-4, 10^-2, 1, 10², 10⁴, 10⁶
spaced steps	binary search
num_step	8
batch size	32
train class few-shot nums	10
val class few-shot nums	half
input resolution	224×224
mean	0.485, 0.456, 0.406
std	0.229, 0.224, 0.225

This information is referenced from the paper Learning Transferable Visual Models From Natural Language Supervision.
Pytorch LBFGS Warning: Right now all parameters have to be on a single device. This will be improved in the future.

🌵 Attentive Probing Evaluation

Hyperparameter	Info
training epoch	20
scan Learning rate	1e-3, 3e-4, 1e-4
min Learning rate	1e-7
warmup epoch	5
start Learning rate	0.0
batch size	192
short side size	256
input resolution	224×224
optimizer	AdamW
adamW weight decay	1e-4
adamW eps	1e-8
adamW betas	0.9, 0.999
clip grad	5.0
label smoothing	0.1
attentive head	16
attentive out_dim	768
mean	0.485, 0.456, 0.406
std	0.229, 0.224, 0.225
brightness	0.8 probability: delta in [-0.125, 0.125]
saturation	0.8 probability: factor in [0.6, 1.4]
contrast	0.8 probability: factor in [0.6, 1.4]
hue	0.8 probability: delta in [-0.2, 0.2]
color space conversion	0.1 probability: RGB to BGR

This information is referenced from the paper Scaling 4D Representations.

🔨 Installation

apt-get install -y ffmpeg libavcodec-dev libavfilter-dev libavformat-dev libavutil-dev

conda create --name videomae python=3.10 -y
conda activate videomae
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install decord==0.6.0
pip install nvidia-dali-cuda120==1.44.0
pip install timm==0.4.12
pip install tensorboardX==2.6.2.2
pip install SciPy==1.11.4
pip install matplotlib
pip install scikit-image==0.24.0
pip install flash-attn==2.7.2
pip install psutil==6.0.0
pip install opencv-python

🚗 Citation

If you find this repository useful, please use the following BibTeX entry for citation.

@misc{deepglint_videobenchmark2025,
      title={Video Benchmark Suite: Rapid Evaluation of Video Foundation Models}, 
      url={ https://github.com/deepglint/Video_Benchmark_Suite },
      author={Yang, Ninghua, Feng, Ziyong},
      year={2025}
}

👀 Acknowledgement

This repository is built based on V-SWIFT, VideoMAE, VideoMAEv2, jepa, unmasked_teacher, CLIP_benchmark, vitookit, BlackVIP, open_clip and InternVideo repository.

📚 License

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
fewshot_video/ActionRecognition		fewshot_video/ActionRecognition
misc		misc
video_models		video_models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ac_ap_dataloader_dali.py		ac_ap_dataloader_dali.py
ac_export_feature_and_attentive_probe.py		ac_export_feature_and_attentive_probe.py
ac_export_feature_and_linear_probe.py		ac_export_feature_and_linear_probe.py
ac_lp_dataloader_dali.py		ac_lp_dataloader_dali.py
all_utils.py		all_utils.py
model_collection.py		model_collection.py
sh_attentive_probe_umt.sh		sh_attentive_probe_umt.sh
sh_attentive_probe_v-swift.sh		sh_attentive_probe_v-swift.sh
sh_attentive_probe_videomaev2.sh		sh_attentive_probe_videomaev2.sh
sh_linear_probe_umt.sh		sh_linear_probe_umt.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Video Benchmark Suite: Rapid Evaluation of Video Foundation Models

🎯 Video Benchmark

📑 SOTA Video Foundation Models

👏 MODEL_ZOO

📊 Video Evaluation Method

Ours Video Evaluation Details

Video Action Recognition

💾 Evaluation Dataset

🌸 Linear Probing Evaluation

🌵 Attentive Probing Evaluation

🔨 Installation

🚗 Citation

👀 Acknowledgement

📚 License

About

Releases

Packages

Contributors 2

Languages

License

OpenVFM/Video_Benchmark_Suite

Folders and files

Latest commit

History

Repository files navigation

🔍 Video Benchmark Suite: Rapid Evaluation of Video Foundation Models

🎯 Video Benchmark

📑 SOTA Video Foundation Models

👏 MODEL_ZOO

📊 Video Evaluation Method

Ours Video Evaluation Details

Video Action Recognition

💾 Evaluation Dataset

🌸 Linear Probing Evaluation

🌵 Attentive Probing Evaluation

🔨 Installation

🚗 Citation

👀 Acknowledgement

📚 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages