-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[foliatextcontent] propagate markup information to higher/lower levels #19
Comments
(relates also to #16) |
Technical note: This does introduce another challenge. Just like we have text consistency and text validation in FoLiA, ensuring that text specified on multiple levels is consistent with eachother, this would introduce a similar concept of markup consistency, markup validation. |
This seems like something you would like to have in 'some' occasions, but not always. e.g. for a tokenizer, you would like to have the 'un-styled' strings. So maybe we must introduce a 'formatted' attribute or such in the <t> nodes? <t>This is a good example</> vs. <t formatted="1">This is a<t-style class="bold">good</t-style> example</t> On second thought: this is not a really good idea, it would break to many things. Still we need both worlds. |
I don't think there's any need for such an attribute and don't really see what problem it would solve. Calling Getting all the markup requires calling |
Yes, in fact that was my conclusion too. The new function I mentioned should do the deeper diving. An return a TextContent which holds the (combined) styles of the deeper elements |
[Not sure if this is the right place to ask] My FLAT use case is to tokenize the text first in order to enable the (fully manual) word-level spelling error corrections. |
[Related]
Specifying the class as ... .textcontent(cls="OCR") does not seem to make a difference. |
Just wondering if one can add some dummy placeholder style-markup annotation to ucto-tokenized folia.xml in FLAT (regardless of the propagation not yet being in place). |
It would be a separate post-processing step you need to run after ucto
FLAT doesn't really render the style-markup at all currently. |
That would be cool in that way too.
I should have asked rather: would it be possible to do some post processing too, after having annotated in FLAT, so that the style information is possible to re-assign so that again other tools, such as folia2html, can process it? |
If there is markup information in a higher text layer, say on paragraph level, we want to be able to replicate that markup information on lower levels (say sentence or words), if not yet available. We also want the reverse, if there markup information on lower levels, we want to express it also on higher levels.
The text was updated successfully, but these errors were encountered: