Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Beam Search #3066

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

laixinn
Copy link

@laixinn laixinn commented Jan 23, 2025

Motivation & Modifications

Support beam search, RFC in #3032 .

TODO:

  • enable beam search by server args (currently using environment variables)
  • support the overlap mode
  • support jump forward
  • use trition kernel to accelerate indices overwritten for kv cache

Result & Reproduce

  1. Accuracy: test/srt/test_beam_search.py: line 259, test_beam_search_accuracy_by_offline_mmlu
  2. MMLU Score: test/srt/test_beam_search.py: line 336, test_beam_search_memory_leak_via_mmlu
  3. Efficiency (will not run in unit test by default): line 301, bench_beam_search_overhead_by_offline_mmlu

Checklist

laixinn and others added 5 commits January 21, 2025 00:14
Duplicate beam search sequence into requests during scheduling, while the prefix caching depends on the request tokens.

NOTE: A request is finished when the number of finished beam search sequences are enough, which might influence normal request output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants