You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I could use some help and I'm hoping this project is still maintained. I like the salt model and the aim of pepper to provide conversions between different formats. To facilitate interoperability I am writing a convertor from FoLiA to Salt (see proycon/folia#85). FoLiA is a rich XML-based format for linguistic annotation popular in the Netherlands and Flanders and developed in the scope of CLARIN/CLARIAH. I decided to implement the convertor outside of Pepper as a part of foliatools as I'm not a fan of Java and wanted to leverage the existing Python-based FoLiA library to facilitate the conversion. So I'm outputting Salt XML myself.
The salt model is documented, but the Salt XML isn't really, so I relied mostly on existing examples that Pepper outputted. Such as conversion from a TCF example.
Now my convertor is implemented I'm trying to get things to validate and process with pepper, and hopefully resolve any issues that I got wrong in my convertor. This proves to be much more difficult than I had anticipated as I can't even get pepper to import Salt XML properly: I'm a bit stuck at this point and hoping to get some help h ere.
I tried building a conversion/validation workflow with three steps, a SaltXML importer, a SaltValidator and a DoNothingExporter. It doesn't look like any documents get processed (it says 0 of 4, how it gets the number '4' is a mystery to me as there is only one document in my test corpus). To me this looks like nothing got processed or validated:
--------------------------- pepper job status ---------------------------
id: 'la7st384
active documents: 0 of 4
status: initializing
- no documents found to display progress -
-------------------------------------------------------------------------
+----------------------------------- step 1 -----------------------------------+
|importer: SaltXMLImporter |
|path: file:/home/proycon/exp/pepper/saltin/ |
|corpus index: 0 |
|properties: |
| pepper.after.reportCorpusGraph:false |
| pepper.after.tokenize: false |
| |
+----------------------------------- step 2 -----------------------------------+
|manipulator: SaltValidator |
|path: null |
|properties: |
| pepper.after.reportCorpusGraph:false |
| pepper.after.tokenize: false |
| |
+----------------------------------- step 3 -----------------------------------+
|exporter: DoNothingExporter |
|path: file:/home/proycon/exp/pepper/saltout/ |
|properties: |
| pepper.after.reportCorpusGraph:false |
| pepper.after.tokenize: false |
| |
+------------------------------------------------------------------------------+
--------------------------- pepper job status ---------------------------
id: 'la7st384
active documents: 0 of 4
status: ended
- no documents found to display progress -
-------------------------------------------------------------------------
Unfortunately, there's not really any validation information to go by yet, so I set out to test a similar pepper pipeline by reimporting salt XML pepper itself outputted (conversion from TCF source). That way I can test pepper as-is and ruling out my convertor did anything wrong. I get almost exactly the same output (0 of 4 documents) in that case.
Pepper's output is a bit confusing, and the validation option I enabled did not explicitly report anything afaik,
but after fixing some problems in my convertor, it does seem to get parsed and I can get some conversions.
I could use some help and I'm hoping this project is still maintained. I like the salt model and the aim of pepper to provide conversions between different formats. To facilitate interoperability I am writing a convertor from FoLiA to Salt (see proycon/folia#85). FoLiA is a rich XML-based format for linguistic annotation popular in the Netherlands and Flanders and developed in the scope of CLARIN/CLARIAH. I decided to implement the convertor outside of Pepper as a part of foliatools as I'm not a fan of Java and wanted to leverage the existing Python-based FoLiA library to facilitate the conversion. So I'm outputting Salt XML myself.
The salt model is documented, but the Salt XML isn't really, so I relied mostly on existing examples that Pepper outputted. Such as conversion from a TCF example.
Now my convertor is implemented I'm trying to get things to validate and process with pepper, and hopefully resolve any issues that I got wrong in my convertor. This proves to be much more difficult than I had anticipated as I can't even get pepper to import Salt XML properly: I'm a bit stuck at this point and hoping to get some help h ere.
I tried building a conversion/validation workflow with three steps, a SaltXML importer, a SaltValidator and a DoNothingExporter. It doesn't look like any documents get processed (it says 0 of 4, how it gets the number '4' is a mystery to me as there is only one document in my test corpus). To me this looks like nothing got processed or validated:
Unfortunately, there's not really any validation information to go by yet, so I set out to test a similar pepper pipeline by reimporting salt XML pepper itself outputted (conversion from TCF source). That way I can test pepper as-is and ruling out my convertor did anything wrong. I get almost exactly the same output (0 of 4 documents) in that case.
My initial test corpus (one document) outputted by the new converter: https://download.anaproy.nl/foliasalt.tar.gz
Salt output after conversion from TCF using Pepper (my reference example): https://download.anaproy.nl/tcfsalt.tar.gz
I'm hoping someone can point me at what's going wrong? How can I properly validate my Salt XML output?
The text was updated successfully, but these errors were encountered: