Skip to content

πŸš€ [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with simple questions about natural imagery.

Notifications You must be signed in to change notification settings

Baiqi-Li/NaturalBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

(NeurIPS24) NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples

Links:

🚩 News

Usages

  • VQA Task

    There are two approaches to use and evaluate NaturalBench benchmark:

    1. Evaluation based on the example code:

    Learn how to use and evaluate NaturalBench by reviewing the simple example in naturalbench_vqa.py.

    2. Evaluation with lmms-eval and VLMEvalKit:

    Please refer to the official documentation of lmms-eval and VLMEvalKit for more details.

    • lmms-eval:

      python3 -m accelerate.commands.launch \
          --num_processes=1 \
          -m lmms_eval \
          --model llava_onevision \
          --model_args pretrained="lmms-lab/llava-onevision-qwen2-7b-ov" \
          --tasks naturalbench \
          --batch_size 1 \
          --log_samples \
          --log_samples_suffix llava_onevision_naturalbench \
          --output_path ./logs/
    • VLMEvalKit:

      python run.py --data NaturalBenchDataset --model llava-onevision-qwen2-7b-ov-hf --verbose
  • Retrieval Task

    To use the retrieval task, install t2v_metric package, then run the evaluation code:

    python naturalbench_retrieval.py
    

Citation Information

@inproceedings{naturalbench,
  title={NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples},
  author={Li, Baiqi and Lin, Zhiqiu and Peng, Wenxuan and Nyandwi, Jean de Dieu and Jiang, Daniel and Ma, Zixian and Khanuja, Simran and Krishna, Ranjay and Neubig, Graham and Ramanan, Deva},
  booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2024},
  url={https://openreview.net/forum?id=Dx88A9Zgnv}
}

About

πŸš€ [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with simple questions about natural imagery.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages