How to set Sudachipy split_mode in the JapaneseTokenizer #12871
Replies: 2 comments 4 replies
-
OK, so I did some debugging, and the option is definitely being received by the tokenizer, but I noticed that If I edit the spaCy code in Is this a bug? Or if not, how would I set split_mode to "B" without editing the spaCy code directly? |
Beta Was this translation helpful? Give feedback.
-
You don't want to change the If you have a new blank pipeline, you can pick whichever nlp = spacy.blank("ja", config={"nlp.tokenizer.split_mode":"B"}) |
Beta Was this translation helpful? Give feedback.
-
I think the answer in #8027 (comment) is now out of date.
I notice there is a more recent config option called
split_mode
, so I tried this:However, it had no effect and still behaved as if split mode "A" were in effect.
I think I have the correct config key here, but is anyone able to point out what I'm doing wrong?
Also, @polm in #8027 mentioned that:
Hopefully this is also out of date now that the config option is exposed with the intention of allowing you to choose among different split modes?
Beta Was this translation helpful? Give feedback.
All reactions