We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
目前的tokenizer都与之前的不一样了(vocab里缺少了id 3-13, 新增了许多added_tokens),是有什么特别理由吗?
例如: https://huggingface.co/01-ai/Yi-1.5-34B-Chat/blob/main/tokenizer.json https://huggingface.co/01-ai/Yi-1.5-34B-32K/blob/main/tokenizer.json
是否可以在vocab补上缺失的那几个tokens?
The text was updated successfully, but these errors were encountered:
你好,因为我们发现fast tokenizer会有一些问题,比如32K base模型无法输出空格,但slow tokenizer不会出现,所以对tokenier.json进行了更新。
Sorry, something went wrong.
能给个示例吗,我测试下来fast和slow都可以正常输出空格(token_id)。
No branches or pull requests
目前的tokenizer都与之前的不一样了(vocab里缺少了id 3-13, 新增了许多added_tokens),是有什么特别理由吗?
例如:
https://huggingface.co/01-ai/Yi-1.5-34B-Chat/blob/main/tokenizer.json
https://huggingface.co/01-ai/Yi-1.5-34B-32K/blob/main/tokenizer.json
是否可以在vocab补上缺失的那几个tokens?
The text was updated successfully, but these errors were encountered: