Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support minicpmv v2.6 #2785

Merged
merged 4 commits into from
Jan 18, 2025
Merged

Conversation

mickqian
Copy link
Contributor

@mickqian mickqian commented Jan 8, 2025

Motivation

Addressing #2461

Modifications

  1. Add new model MiniCPMV and corresponding processor MiniCPMVImageProcessor
  2. Register new chat and conversation templates

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@mickqian
Copy link
Contributor Author

mickqian commented Jan 9, 2025

Questions

  1. From my understanding of current implementation of sglang, it does not support two kind of attentions at the same time, thus the attention module in Resampler can't be replaced with RadixAttention. Is that correct?

@mickqian mickqian changed the title [WIP] [Feature] Support minicpmv v2.6 [Feature] Support minicpmv v2.6 Jan 9, 2025
@mickqian mickqian marked this pull request as ready for review January 9, 2025 04:42
@zhaochenyang20
Copy link
Collaborator

Questions

  1. From my understanding on current implementation of sglang, it does not support two kind of attentions at the same time, thus the attention module in Resampler and Idefics2VisionAttention can't be replaced with RadixAttention. Is that correct?
  2. Current implementation of Idefics2VisionAttention in this PR relies on vllm.attention.MultiheadAttention, which is introduced in version vllm==0.6.5. If that is not acceptable, I will rewrite the attention module

@Ying1123 @merrymercy

@mickqian mickqian force-pushed the minicpmv branch 14 times, most recently from 42f09c0 to c2abb42 Compare January 10, 2025 03:01
@zhaochenyang20
Copy link
Collaborator

@mickqian Looks nice! Are there anything left to do?

@mickqian
Copy link
Contributor Author

mickqian commented Jan 10, 2025

@mickqian Looks nice! Are there anything left to do?看起来不错!还有什么要做的吗?

  1. The number of frames processed from the video is hard-coded for the moment, which should be calculated. I added a TODO on it.
  2. A CI Test failed because of this:
[2025-01-10 03:24:51 TP0] Multimodal prompt is too long after expanding multimodal tokens. After expanding len(req.origin_input_ids_unpadded)=29 => 308 >= 294.

from an assertion of checking the input_ids' length. The token limit (from my understanding) is calculated from the GPU:

 if len(req.origin_input_ids) >= self.max_req_input_len:
        logger.error(
            "Multimodal prompt is too long after expanding multimodal tokens. "
            f"After expanding {len(req.origin_input_ids_unpadded)=} => {len(req.origin_input_ids)} >= {self.max_req_input_len}. "
                )
  1. Some mp4 file failed the decode process in decode_video_base64

@mickqian mickqian force-pushed the minicpmv branch 3 times, most recently from aef5497 to fcf2ddd Compare January 11, 2025 13:50
@zhaochenyang20
Copy link
Collaborator

@hnyls2002 Could you help to review. Thanks!

@mickqian mickqian requested a review from HaiShaw as a code owner January 17, 2025 01:21
@mickqian mickqian force-pushed the minicpmv branch 10 times, most recently from 424c100 to 001d017 Compare January 17, 2025 10:08
@zhaochenyang20
Copy link
Collaborator

@mickqian Thanks. We will update the API key as soon as possible. Before that, could you fix the conflicts of qwen2_vl.py.

@mickqian mickqian force-pushed the minicpmv branch 3 times, most recently from 36d213e to a382656 Compare January 17, 2025 16:39
@yizhang2077
Copy link
Collaborator

LGTM cc @merrymercy

@yizhang2077 yizhang2077 self-requested a review January 18, 2025 06:04
@zhaochenyang20 zhaochenyang20 merged commit 3d93f84 into sgl-project:main Jan 18, 2025
16 checks passed
Copy link
Contributor

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can you add some unit tests like this to compare the logtis against HF implementation? https://github.com/sgl-project/sglang/pull/2365/files#diff-6e52783df34170e0b3d9aadbd7c338d9c16f0303c18846afc1d298c32d4a4eb2R1
  2. Can you help update the docs here for VLM?
    ## How to Support a New Model

python/sglang/srt/managers/schedule_batch.py Show resolved Hide resolved
Comment on lines +42 to +46
from vllm.distributed import divide, get_tensor_model_parallel_world_size
from vllm.model_executor.layers.resampler import get_2d_sincos_pos_embed
from vllm.model_executor.layers.sampler import SamplerOutput, get_sampler
from vllm.model_executor.models.module_mapping import MultiModelKeys
from vllm.model_executor.sampling_metadata import SamplingMetadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not import anything from vllm.

  • SamplingMetadata, SamplerOutput are not used.
  • Use sglang.srt.distributed
  • copy over small utility functions.

python/sglang/srt/utils.py Show resolved Hide resolved
@zhaochenyang20
Copy link
Collaborator

@mickqian Also, remove Chinese comments in the PR. Thanks so much. Could we try not import any vllm dependency, rather rewrite it ourselves.

@zhaochenyang20
Copy link
Collaborator

Under the models files, we prefer not to import anything from vllm. We will remove them all later.

@mickqian
Copy link
Contributor Author

Trying to address these in #2977

@zhaochenyang20 zhaochenyang20 mentioned this pull request Jan 24, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants