No proper data to initialize the Model of component 'tok2vec' #11428
-
I am encountering the following error when running the spacy train. "AssertionError: [E923] It looks like there is no proper sample data to initialize the Model of component 'tok2vec'. To check your input data paths and annotation, run: python -m spacy debug data config.cfg and include the same I have created two json file (train.json and dev.json) and I converted them into binary .spacy files. I do not understand what am I doing wrong. Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
Hi @lorenzo82 , My guess is that something wrong happened during the conversion of the JSON file to the spaCy file. I wonder where the JSON file came from? (is this from the v2.x version of spaCy?). Assuming you converted it using convert, perhaps the next step is to manually inspect the spaCy files and check for empty docs. import spacy
from spacy.tokens import DocBin
nlp = spacy.blank("xx") # or a language code, e.g. `en`
doc_bin = DocBin().from_disk("path/to/file.spacy")
docs = list(doc_bin.get_docs(nlp.vocab)) There could be several reasons as to why this happens. Perhaps it's in the formatting of the JSON file or an error during the conversion process. |
Beta Was this translation helpful? Give feedback.
-
In order to give you more element, you can find the formatting of the JSON file.
|
Beta Was this translation helpful? Give feedback.
Hi @lorenzo82 ,
My guess is that something wrong happened during the conversion of the JSON file to the spaCy file. I wonder where the JSON file came from? (is this from the v2.x version of spaCy?). Assuming you converted it using convert, perhaps the next step is to manually inspect the spaCy files and check for empty docs.
There could be several reasons as to why this happens. Perhaps it's in the formatting of the JSON file or an error during the conversion process.