Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some questions regarding the new <t-hspace> tag #95

Open
kosloot opened this issue Apr 12, 2021 · 7 comments
Open

some questions regarding the new <t-hspace> tag #95

kosloot opened this issue Apr 12, 2021 · 7 comments

Comments

@kosloot
Copy link
Collaborator

kosloot commented Apr 12, 2021

recently a <t-hspace> tag is introduced, but when I started using it , some questions arose:

  1. It is possible the add some text to a <h-space> like this:
    <t-hspace>extra text</t-hspace>
    This is acceptable to foliavalidator and folialint, but doesn't show up in text() output. Probably that is OK
    In libfolia, it DOES show up, which is a bug I assume?
    But shouldn't we disallow this construct? To avoid strange effects and misunderstandings?
  2. There are NO predefined class values for <h-space>. I understand the ratio, but that poses a big burden on all tools that would like to make use of it. They all have to create their own text() extraction functions and would be very helped by a predefined set, that the libraries support. Like "tab", "space", "wide-space", or such.
    I realize that defining such a set might be a challenge, but still.
    The text() function is very complex and replicating it is cumbersome. (like handling of the tag' feature already showed us.)
    Another possibility might be a way of providing a translation table for those class values:
    tab ==> '\t'
    space ==> ' _'
    wide-space ==> ' __'
@proycon
Copy link
Owner

proycon commented Apr 12, 2021

  1. Good point, this is indeed not intentional and should be disallowed.
  2. We could define a set, implement some support for it in the libraries, and recommend its usage. It's then simply up to users whether they decide to use that set or not (i.e. it'll be an opt-in choice).

@kosloot
Copy link
Collaborator Author

kosloot commented Apr 12, 2021

Good point, this is indeed not intentional and should be disallowed.

Maybe the same holds for a few of the other text Markup tags too?

We could define a set, implement some support for it in the libraries, and recommend its usage. It's then simply up to users whether they decide to use that set or not (i.e. it'll be an opt-in choice).

That would be great. Leaving us with a challenge to create a reasonable set.

@kosloot
Copy link
Collaborator Author

kosloot commented Apr 12, 2021

We can simply forbid text in a TextMarkupHSpace by adding 1 line in folia_properties.cxx:

//------ TextMarkupHSpace -------
    TextMarkupHSpace::PROPS = AbstractTextMarkup::PROPS;
    TextMarkupHSpace::PROPS.ACCEPTED_DATA.erase( XmlText_t );           <=== 1 extra line
    TextMarkupHSpace::PROPS.ELEMENT_ID = TextMarkupHSpace_t;

But maybe this is not generic enough?

Otherwise XmlText_t could be removed from AbstractTextMarkup::PROPS, and explicitly added for the Sub-classes it applies to?

@proycon
Copy link
Owner

proycon commented Apr 12, 2021

Generally we have the TEXTCONTAINER property for this. ACCEPTED_DATA only carries FoLiA elements in my implementations.

@kosloot
Copy link
Collaborator Author

kosloot commented Apr 12, 2021

A right. That is a better solution, and it works:

folialint tests/bug59.xml
tests/bug59.xml failed: XML error: found extra text 'test' inside element <t-hspace>, NOT allowed there.

the input contained:

    <div xml:id="example.div.4" class="section" n="4">
      <t>Space,<t-hspace>test</t-hspace>the<t-hspace/>final<t-hspace/><t-hspace/>frontier</t>
    </div>

@kosloot
Copy link
Collaborator Author

kosloot commented Apr 12, 2021

Ok, but still there is room for rather suspicious constructions like:

      <t>Space,<t-hspace><t-str>test</t-str><t-hbr>what</t-hbr></t-hspace>the<t-hspace/>final<t-hspace/><t-hspace/>frontier</t>

This passes folialint and foliavalidator, and both folia2txt and FoLiA-2text ignore everything inside the <t-hspace> but
still this is confusing and should be rejected imho

@proycon
Copy link
Owner

proycon commented Apr 13, 2021

Agreed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants