SamuelCahyawijaya
released this
22 Jun 01:59
·
4 commits
to master
since this release
- Fix spacing between subword when decoding using IndoNLGTokenizer
- Remove unused additional special tokens '[java]', '[sunda]', '[indonesia]' from IndoNLGTokenizer (language tokens are included in the
special_tokens_to_ids
instead)