Qwen2.5-72B-Instruct YaRN BUG #2630

PaulX1029 · 2024-12-26T06:10:36Z

System Info

The current config.json is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.

For supported frameworks, you could add the following to config.json to enable YaRN:
{
...,
"rope_scaling": {
"factor": 4.0,
"original_max_position_embeddings": 32768,
"type": "yarn"
}
}

The following message appears. Is it affected?

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/qwen

Expected behavior

正确推理

actual behavior

出现提示

additional notes

1

The text was updated successfully, but these errors were encountered:

PaulX1029 added the bug Something isn't working label Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2.5-72B-Instruct YaRN BUG #2630

Qwen2.5-72B-Instruct YaRN BUG #2630

PaulX1029 commented Dec 26, 2024

Qwen2.5-72B-Instruct YaRN BUG #2630

Qwen2.5-72B-Instruct YaRN BUG #2630

Comments

PaulX1029 commented Dec 26, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes