Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Address remain issues of supporting MiniCPMV #2977

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mickqian
Copy link
Contributor

Motivation

Address remaining issues of #2785

Modifications

  1. Update document of implementing a new vision-llm
  2. Add some test for comparing logits output of SGLang and HF
  3. Code cleanup

Checklist

@mickqian mickqian mentioned this pull request Jan 19, 2025
3 tasks
@mickqian mickqian force-pushed the minicpmv branch 9 times, most recently from d4a7a2d to d32da67 Compare January 19, 2025 17:04
@zhaochenyang20
Copy link
Collaborator

Great work! Once ready, please ask us to review.

@merrymercy
Copy link
Contributor

also remove these vllm dependency

from vllm.distributed import parallel_state
from vllm.distributed import utils as dist_utils

@mickqian mickqian force-pushed the minicpmv branch 5 times, most recently from 3453ce7 to 323aaf7 Compare January 21, 2025 11:35
@mickqian mickqian force-pushed the minicpmv branch 3 times, most recently from a04062f to 6f78efe Compare January 21, 2025 14:09
@mickqian mickqian marked this pull request as ready for review January 22, 2025 03:56
max_seqlen,
is_causal=False,
)
if self.use_context_forward:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context_attention_fwd generates different result with hf implementation: SiglipAttention, probably because:
1.SiglipAttention performs softmax in float32:

attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
  1. SiglipAttention performs full-sequence + mask, whereas context_attention_fwd skips padding tokens, leaving 0s in the attention weights.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does qwen2vl model have the same question? If so, I think maybe we can set qwen2vl use_context_forward to False or even remove use_context_forward branch.

@zhaochenyang20
Copy link
Collaborator

Cool. I will ask @yizhang2077 to help~

"tgt_sizes": [inputs["tgt_sizes"]],
"im_start_id": [self.tokenizer.im_start_id],
"im_end_id": [self.tokenizer.im_end_id],
"slice_start_id": [self.tokenizer.slice_start_id],
Copy link
Collaborator

@yizhang2077 yizhang2077 Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I think maybe this test can be more general so that it can also be used by qwen2vl or more VLM?

@@ -457,6 +457,8 @@ def setUpClass(cls):
"--trust-remote-code",
"--chat-template",
"minicpmv",
"--max-total-tokens",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need add args here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants