added documentation about token hyphenation in the page schema #18

mromanello · 2019-10-08T09:27:02Z

closes issue #17

…s issue #4)

aflueckiger

I added some suggestions and questions.

aflueckiger · 2019-10-08T14:24:47Z

docs/page.schema.json

+                                                        },
+                                                        "hy": {
+                                                            "type": "boolean",
+                                                            "description": "Indicates whether the token constitutes the first part of a hyphenated word. When not specified it is assumed to be `false`."


Do we record the hyphen itself as well or is it implicit similiar to the whitespace? We have some words with multiple hyphenation due to limited space in a table cell. How do i deal with this? Should I record all? Accordingly, I suggest to write "the former part before the hyphen (incl. / excl. hyphen)" instead of "first".

that's an interesting case that we haven't had so far. I would mark the first part with hy=True, and all the remaining with nf=....

What do you think @simon-clematide and @e-maud ?

Anyway, we will have to test carefully the behavior of the rebuilt script in such cases.

aflueckiger · 2019-10-08T14:31:01Z

docs/page.schema.json

+                                                        },
+                                                        "nf": {
+                                                            "type": "string",
+                                                            "description": "It is specified on the second part of a hyphenated word, and contains its normalized (reconstructed) form."


maybe, this is more straightforward:
"normalized (dehyphenated)" instead of "normalized (reconstructed")

in case we allow for multi-hyphenation change "second" to "latter".

added documentation about token hyphenation in the page schema (close…

7864709

…s issue #4)

mromanello requested review from simon-clematide, e-maud and aflueckiger October 8, 2019 09:27

aflueckiger reviewed Oct 8, 2019

View reviewed changes

did changes suggested by @aflueckiger

f54126b

mromanello merged commit 1074ca8 into master Oct 11, 2019

mromanello deleted the issue-4/hyphenation branch October 11, 2019 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added documentation about token hyphenation in the page schema #18

added documentation about token hyphenation in the page schema #18

mromanello commented Oct 8, 2019 •

edited

Loading

aflueckiger left a comment

aflueckiger Oct 8, 2019

mromanello Oct 9, 2019

aflueckiger Oct 8, 2019

added documentation about token hyphenation in the page schema #18

added documentation about token hyphenation in the page schema #18

Conversation

mromanello commented Oct 8, 2019 • edited Loading

aflueckiger left a comment

Choose a reason for hiding this comment

aflueckiger Oct 8, 2019

Choose a reason for hiding this comment

mromanello Oct 9, 2019

Choose a reason for hiding this comment

aflueckiger Oct 8, 2019

Choose a reason for hiding this comment

mromanello commented Oct 8, 2019 •

edited

Loading