No proper data to initialize the Model of component 'tok2vec' #11428

lorenzo82 · 2022-09-02T10:35:09Z

lorenzo82
Sep 2, 2022

I am encountering the following error when running the spacy train.

"AssertionError: [E923] It looks like there is no proper sample data to initialize the Model of component 'tok2vec'. To check your input data paths and annotation, run: python -m spacy debug data config.cfg and include the same
config override values you would specify for the 'spacy train' command."

I have created two json file (train.json and dev.json) and I converted them into binary .spacy files.
I have download the base_config.cfg and updated it into a config.cfg file.
After the execution of "python -m spacy debug data config.cfg" I obtained the folowing error:
"AssertionError: [E923] It looks like there is no proper sample data to initialize the Model of component 'tok2vec'. To check your input data paths and annotation, run: python -m spacy debug data config.cfg and include the same
config override values you would specify for the 'spacy train' command."

I do not understand what am I doing wrong.

Thank you.

Answered by ljvmiranda921

Sep 5, 2022

Hi @lorenzo82 ,

My guess is that something wrong happened during the conversion of the JSON file to the spaCy file. I wonder where the JSON file came from? (is this from the v2.x version of spaCy?). Assuming you converted it using convert, perhaps the next step is to manually inspect the spaCy files and check for empty docs.

import spacy
from spacy.tokens import DocBin

nlp = spacy.blank("xx")  # or a language code, e.g. `en`
doc_bin = DocBin().from_disk("path/to/file.spacy")
docs = list(doc_bin.get_docs(nlp.vocab))

There could be several reasons as to why this happens. Perhaps it's in the formatting of the JSON file or an error during the conversion process.

View full answer

ljvmiranda921 · 2022-09-05T04:07:05Z

ljvmiranda921
Sep 5, 2022

Hi @lorenzo82 ,

My guess is that something wrong happened during the conversion of the JSON file to the spaCy file. I wonder where the JSON file came from? (is this from the v2.x version of spaCy?). Assuming you converted it using convert, perhaps the next step is to manually inspect the spaCy files and check for empty docs.

import spacy
from spacy.tokens import DocBin

nlp = spacy.blank("xx")  # or a language code, e.g. `en`
doc_bin = DocBin().from_disk("path/to/file.spacy")
docs = list(doc_bin.get_docs(nlp.vocab))

There could be several reasons as to why this happens. Perhaps it's in the formatting of the JSON file or an error during the conversion process.

0 replies

lorenzo82 · 2022-09-05T14:10:12Z

lorenzo82
Sep 5, 2022
Author

In order to give you more element, you can find the formatting of the JSON file.

[ [ "Global analysis and simulation of land-use change associated with urbanization.", { "entities": [ [ 34, 42, "LC" ] ] } ], [ "Exploring subpixel learning algorithms for estimating global land cover frac- tions from satellite data using high performance computing.", { "entities": [ [ 61, 71, "LC" ] ] } ] ]

7 replies

achal648 Oct 12, 2022

Hi @lorenzo82 ,

Could you please share how you solved it? I did the above check mentioned by @ljvmiranda921 and cross-checked my conversion script; nothing seems wrong but still getting this error.

ljvmiranda921 Oct 12, 2022

Hi @achal648 , can you give an example of your dataset before passing it to the conversion script?

achal648 Oct 13, 2022

Hi @ljvmiranda921 The issue was with the training and dev data path that I was passing through CLI command. It is solved now, thanks!

sunilksamanta Oct 31, 2023

@ljvmiranda921
Hi this is my format. [ [ "Requesting Hiring for the Radar System featuring Viking Line equipment", { "entities": [ [ 26, 38, "EQUIPMENT" ], [ 49, 60, "BRAND" ], [ 11, 17, "SERVICE" ] ] } ], [ "Desire Repair for the Refrigeration Units utilizing Hamburg equipment", { "entities": [ [ 22, 41, "EQUIPMENT" ], [ 52, 63, "BRAND" ], [ 7, 13, "SERVICE" ] ] } ]]

when I am converting it says ✔ Generated output file (0 documents): spacy_training_data.spacy

rmitsch Nov 2, 2023
Maintainer

Hi @sunilksamanta, please open a new discussion and provide more information on your use case. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No proper data to initialize the Model of component 'tok2vec' #11428

{{title}}

Replies: 2 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

No proper data to initialize the Model of component 'tok2vec' #11428

lorenzo82 Sep 2, 2022

Replies: 2 comments · 7 replies

ljvmiranda921 Sep 5, 2022

lorenzo82 Sep 5, 2022 Author

achal648 Oct 12, 2022

ljvmiranda921 Oct 12, 2022

achal648 Oct 13, 2022

sunilksamanta Oct 31, 2023

rmitsch Nov 2, 2023 Maintainer

lorenzo82
Sep 2, 2022

Replies: 2 comments 7 replies

ljvmiranda921
Sep 5, 2022

lorenzo82
Sep 5, 2022
Author

rmitsch Nov 2, 2023
Maintainer