| π Home Page | π€HuggingFace | πLeaderboard | πPaper | π₯οΈ Code |
- π Sep. 26, 2024. NaturalBench was accepted by NeurIPS!
- β We have integrated NaturalBench into lmms-eval and VLMEvalKit.
- β We have integrated NaturalBench-Retrieval Dataset into t2v_metric.
- β NaturalBench-Retrieval Dataset: the download link from huggingface homepage.
-
There are two approaches to use and evaluate NaturalBench benchmark:
Learn how to use and evaluate NaturalBench by reviewing the simple example in naturalbench_vqa.py.
Please refer to the official documentation of
lmms-eval
andVLMEvalKit
for more details.-
lmms-eval:
python3 -m accelerate.commands.launch \ --num_processes=1 \ -m lmms_eval \ --model llava_onevision \ --model_args pretrained="lmms-lab/llava-onevision-qwen2-7b-ov" \ --tasks naturalbench \ --batch_size 1 \ --log_samples \ --log_samples_suffix llava_onevision_naturalbench \ --output_path ./logs/
-
VLMEvalKit:
python run.py --data NaturalBenchDataset --model llava-onevision-qwen2-7b-ov-hf --verbose
-
-
To use the retrieval task, install t2v_metric package, then run the evaluation code:
python naturalbench_retrieval.py
@inproceedings{naturalbench,
title={NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples},
author={Li, Baiqi and Lin, Zhiqiu and Peng, Wenxuan and Nyandwi, Jean de Dieu and Jiang, Daniel and Ma, Zixian and Khanuja, Simran and Krishna, Ranjay and Neubig, Graham and Ramanan, Deva},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024},
url={https://openreview.net/forum?id=Dx88A9Zgnv}
}