Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2.5-72B-Instruct YaRN BUG #2630

Open
1 of 4 tasks
PaulX1029 opened this issue Dec 26, 2024 · 0 comments
Open
1 of 4 tasks

Qwen2.5-72B-Instruct YaRN BUG #2630

PaulX1029 opened this issue Dec 26, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@PaulX1029
Copy link

System Info

The current config.json is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.

For supported frameworks, you could add the following to config.json to enable YaRN:
{
...,
"rope_scaling": {
"factor": 4.0,
"original_max_position_embeddings": 32768,
"type": "yarn"
}
}
Image

The following message appears. Is it affected?

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/qwen

Expected behavior

正确推理

actual behavior

出现提示

additional notes

1

@PaulX1029 PaulX1029 added the bug Something isn't working label Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant