Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any Methods to alleviate Curse of Multi-Modalities? #1

Open
VincentVanNF opened this issue Nov 25, 2024 · 1 comment
Open

Any Methods to alleviate Curse of Multi-Modalities? #1

VincentVanNF opened this issue Nov 25, 2024 · 1 comment

Comments

@VincentVanNF
Copy link

Hello, thank you very much for your research findings, particularly regarding the two multimodal hallucination issues mentioned in the paper: SPURIOUS INTER-MODALITY CORRELATIONS and OVERRELIANCE ON UNIMODAL PRIORS.

While performing SFT training on a specific classification task based on the Qwen2-VL-7B model, I encountered the aforementioned hallucination problems during inference on the test set. These issues significantly impact further performance improvements of the model, especially when trying to boost performance from 90 to 95. Are there any methods or references to alleviate these problems? Thank you very much for your response.

@LengSicong
Copy link
Collaborator

Hi, thanks for your interest!

You can try different decoding methods like VCD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants