How to convert jsonl format files into spacy format files? #13074
-
Hi!
I know this converter is for json files, but I didn't find a converter for jsonl files. Then I used the debug command
to see if there was problems with my data set, and I got errors like this:
There is something wrong during the process of converting the jsonl file into a spacy file, but I dont know how to solve it, and I really appreciate help!
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The JSON converter isn't a generic JSON converter. It's only intended for converting a specific spacy v2 JSON format, so here it unfortunately ends up producing an empty For this JSONL format, there isn't a built-in converter and your best option may be to write your own converter. In a quick search, I can find a number of examples related to converting doccano to spacy, but I haven't tried any of them myself, and they may produce spacy v2 JSON format. The v2 JSON format can still be used, but then you'd need to an additional conversion step to convert it for spacy v3 with the current To help you get started if you can't find an existing converter, here's an example of custom converter for a JSON format: https://github.com/explosion/projects/blob/v3/pipelines/ner_demo/scripts/convert.py for this data (since it's pretty similar in terms of texts and offsets, so primarily adjust how it reads text + offsets/labels for your format): https://github.com/explosion/projects/blob/v3/pipelines/ner_demo/assets/train.json |
Beta Was this translation helpful? Give feedback.
The JSON converter isn't a generic JSON converter. It's only intended for converting a specific spacy v2 JSON format, so here it unfortunately ends up producing an empty
.spacy
file, which is what leads to this initialization error (there is no data).For this JSONL format, there isn't a built-in converter and your best option may be to write your own converter. In a quick search, I can find a number of examples related to converting doccano to spacy, but I haven't tried any of them myself, and they may produce spacy v2 JSON format. The v2 JSON format can still be used, but then you'd need to an additional conversion step to convert it for spacy v3 with the current
spacy convert
.To help …