Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tei2folia: autodeclare should be enabled? #52

Open
pirolen opened this issue Feb 8, 2023 · 3 comments
Open

tei2folia: autodeclare should be enabled? #52

pirolen opened this issue Feb 8, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@pirolen
Copy link

pirolen commented Feb 8, 2023

I pip installed foliatools.
On the attached small test file I run tei2folia and got the below error.

In the main.py of foliapy I see #autodeclare is enabled (default for FoLiA v2).

$ tei2folia  --traceback  /home/pirol/quanti/devel/diagn/collate1.tei.xml -o /home/pirol/quanti/devel/diagn/
Instantiating XML parser
Converting /home/pirol/quanti/devel/diagn/collate1.tei.xml
VALIDATION ERROR on full parse by library in /home/pirol/quanti/devel/diagn/collate1.tei.xml
DeclarationError: Encountered an instance without proper declaration: Comment <comment>!
-- Full traceback follows -->
Traceback (most recent call last):
  File "/home/pirol/quanti/devel/lama/lama/lib/python3.8/site-packages/foliatools/tei2folia.py", line 86, in convert
    doc = folia.Document(tree=transformed, debug=kwargs.get('debug',0))
  File "/home/pirol/quanti/devel/lama/lama/lib/python3.8/site-packages/folia/main.py", line 7427, in __init__
    self.parsexml(kwargs['tree'])
  File "/home/pirol/quanti/devel/lama/lama/lib/python3.8/site-packages/folia/main.py", line 8646, in parsexml
    return Class.parsexml(node,self)
  File "/home/pirol/quanti/devel/lama/lama/lib/python3.8/site-packages/folia/main.py", line 3575, in parsexml
    return super(Comment,Class).parsexml(node, doc, **kwargs)
  File "/home/pirol/quanti/devel/lama/lama/lib/python3.8/site-packages/folia/main.py", line 3416, in parsexml
    instance = Class(doc, *args, **kwargs)
  File "/home/pirol/quanti/devel/lama/lama/lib/python3.8/site-packages/folia/main.py", line 3546, in __init__
    super(Comment,self).__init__(doc, *args, **kwargs)
  File "/home/pirol/quanti/devel/lama/lama/lib/python3.8/site-packages/folia/main.py", line 659, in __init__
    kwargs = self.parsecommonarguments(doc, **kwargs)
  File "/home/pirol/quanti/devel/lama/lama/lib/python3.8/site-packages/folia/main.py", line 787, in parsecommonarguments
    self.checkdeclaration()
  File "/home/pirol/quanti/devel/lama/lama/lib/python3.8/site-packages/folia/main.py", line 1190, in checkdeclaration
    raise DeclarationError("Encountered an instance without proper declaration: " + self.__class__.__name__ + " <" + self.__class__.XMLTAG + ">!")
folia.main.DeclarationError: Encountered an instance without proper declaration: Comment <comment>!
Unable to convert  /home/pirol/quanti/devel/diagn/collate1.tei.xml

collate1.tei.xml.txt

@proycon
Copy link
Owner

proycon commented Feb 8, 2023 via email

@proycon proycon self-assigned this Feb 8, 2023
@proycon proycon added the bug Something isn't working label Feb 8, 2023
@proycon
Copy link
Owner

proycon commented Feb 8, 2023

Ok, it's not a bug in a missing declaration after all, but it's breaking
because the input is unexpected. I already could't imagine the
comment declaration was missing (it's always there by default). Something goes completely
wrong parsing this TEI input. The initial output from the XSLT processor is:

$ xsltproc ~W/foliatools/foliatools/tei2folia.xsl collate1.tei.xml
WARNING: Unknown tag: cx:apparatus (in )
<?xml version="1.0"?>
<comment xmlns="http://ilk.uvt.nl/folia" xmlns:folia="http://ilk.uvt.nl/folia">[tei2folia WARNING] Unhandled tag: cx:apparatus (in )</comment>

cx:apparatus is your root tag there and tei2folia has no idea what that is. It
expects a <TEI> node at the root. In fact, cx is an entirely different namespace
(http://interedition.eu/collatex/ns/1.0), probably some extension to
TEI? There doesn't seem to be a similarly named element in the TEI P5
guidelines.

I also see <rdg> and <app> elements in your document, which the
converter doesn't know yet either (but those do seem to be valid TEI).
If you want support for such documents, I'll have to investigate how to
best map these elements to FoLiA. I see this
documentation

covers it nicely.

@pirolen
Copy link
Author

pirolen commented Feb 9, 2023

Ah, my bad for not spotting that. I simply copied the invalid TEI file from a demo GUI without looking closer. In fact, I was trying to see how FoLiA could render (span?) annotations for variations among different versions of edited text. Of course this is very specific and I cannot expect it to be covered by the converter.
I wonder if FLAT could visualize such spans well, since (as my usual use case) the goal is to let end users correct for errors, in this case false alignments and HTR.

E.g. alignments such as
Screenshot 2023-02-08 at 17 13 02

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants