-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] Address remain issues of supporting MiniCPMV #2977
base: main
Are you sure you want to change the base?
Conversation
d4a7a2d
to
d32da67
Compare
Great work! Once ready, please ask us to review. |
also remove these vllm dependency sglang/python/sglang/srt/layers/attention/vision.py Lines 8 to 9 in 2584f6d
|
3453ce7
to
323aaf7
Compare
a04062f
to
6f78efe
Compare
max_seqlen, | ||
is_causal=False, | ||
) | ||
if self.use_context_forward: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context_attention_fwd
generates different result with hf implementation: SiglipAttention
, probably because:
1.SiglipAttention
performs softmax in float32:
attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
SiglipAttention
performs full-sequence + mask, whereascontext_attention_fwd
skips padding tokens, leaving 0s in the attention weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does qwen2vl model have the same question? If so, I think maybe we can set qwen2vl use_context_forward to False or even remove use_context_forward branch.
Cool. I will ask @yizhang2077 to help~ |
"tgt_sizes": [inputs["tgt_sizes"]], | ||
"im_start_id": [self.tokenizer.im_start_id], | ||
"im_end_id": [self.tokenizer.im_end_id], | ||
"slice_start_id": [self.tokenizer.slice_start_id], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I think maybe this test can be more general so that it can also be used by qwen2vl or more VLM?
@@ -457,6 +457,8 @@ def setUpClass(cls): | |||
"--trust-remote-code", | |||
"--chat-template", | |||
"minicpmv", | |||
"--max-total-tokens", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need add args here?
Motivation
Address remaining issues of #2785
Modifications
Checklist