New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

关于tokenizer.tokenize的疑问 #86

Open

lsx0930 opened this issue Jun 26, 2022 · 1 comment

lsx0930 commented Jun 26, 2022

看过tf的tokenizer的代码，输入的是句子或者单个char，返回的是单个句子或者单个char
而torch的输入输入的是句子或者单个char，返回的是单个句子list或者单个char的list

重要的问题是，如果输入的单个char本身是unk类型的字符，pytorch的tokenizer.tokenize(char) 居然返回的为空而不是[UNK]? 好奇pytorch为啥这样搞，这样直接导致训练数据x和label没办法对齐了......

lsx0930 mentioned this issue

tokenizer.tokenize()问题 #83

Closed

jenfung commented Sep 27, 2022

请问解决了吗，我也对齐不了……

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment