Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(text): Render list items correctly #1137

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

kesara
Copy link
Member

@kesara kesara commented Jul 9, 2024

Fixes #1135
Fixes #1138

See changes made in 3712523 for context.

This change can affect if ol is used inside a li as the first element and the li element is inside another ol element.

I could not find any published RFCs with XML that match the above condition.

These are the RFCs that I found that use ol as the first element in li:

  • RFC8679
  • RFC8687

I didn't notice any changes in text output due to this change. But there are a few minor changes (with spacing) due to changes over the years.

xml2rfc/writers/text.py Outdated Show resolved Hide resolved
@rjsparks
Copy link
Member

rjsparks commented Jul 9, 2024

I think it's worth trying to change things such that generatiing

   4.  4.1  Sublist with '%p' parent counter element in the list
            counter, item one

       4.2  Item two.

       4.3  Item three.

we generate

   4. 

       4.1  Sublist with '%p' parent counter element in the list
            counter, item one

       4.2  Item two.

       4.3  Item three.

Unless that breaks the intended fix for the rfc-to-be.

@cabo
Copy link
Contributor

cabo commented Jul 9, 2024

What is better about that superfluous empty line?

(You can probably have one if you want with <br>)

@rjsparks
Copy link
Member

rjsparks commented Jul 9, 2024

Look at it in context with the rest of the nested <ol> above it in the test document:

4. Ordered list with sub-elements
1. Ordered list, first bullet. Indent="6".
2. Ordered list, second bullet. Indent="6". Text only. Octavio
fuit, cum illam severitatem in eo non arbitrantur. erunt etiam,
et ii quidem eruditi Graecis litteris, contemnentes Latinas,
qui se plane Graecum dici velit, ut a Scaevola est praetore
salutatus Athenis Albucius. In physicis plurimum posuit. ea
scientia et verborum vis et natura orationis et consequentium
repugnantiumve ratio potest perspici.
3. Ordered list, third bullet, element t. Quid? si nos non
interpretum fungimur munere, sed tuemur ea, quae audiebamus,
conferebamus, neque erat umquam controversia, quid ego
intellegerem, sed quid probarem. Extremum autem esse bonorum eum
voluptate vivere. Sic faciam igitur, inquit: unam rem explicabo,
eamque maximam, de physicis alias, et quidem locis pluribus.
Author, et al. Expires 13 January 2019 [Page 7]
Internet-Draft Xml2rfc Vocabulary V3 Elements July 2018
Ordered list, third bullet, second element:
+-------------------------------------------+
| |
| Artwork |
| |
+-------------------------------------------+
Ordered list, third bullet, third element: Definition list,
element dt.
Element dd. Nisi mihi Phaedrum, inquam, tu mentitum aut
Zenonem misisti.
Ordered list, third bullet, fourth element:
Figure
Figure
Figure
Figure 2: Some figure name
1. Ordered list, third bullet, fifth element: Ordered list,
element li. Text only.
2. Ordered list, third bullet, fifth element: Ordered list,
element t. Nam cum ad me de virtute misisti. Quae est enim
contra Cyrenaicos satis acute, nihil ad iucunde vivendum
reperiri posse, quod coniunctione tali sit aptius. Quae etsi
mihi nullo modo nec divelli nec distrahi possint, sic de
iustitia iudicandum est, quae non modo non impediri rationem
amicitiae, si summum bonum diceret, primum in eo essent.
# Ordered list, third bullet, sixth element
A = 1;
* Ordered list, third bullet, seventh element: Unordered list,
element li. Text only.
* Element t. Atque ut odia, invidiae, despicationes adversantur
voluptatibus, sic amicitiae non posse iucunde vivi, nisi
sapienter, honeste iusteque vivatur, nec sapienter, honeste,
iuste, nisi iucunde. Democritea dicit perpauca mutans, sed
ita, ut ea, quae audiebamus, conferebamus, neque erat umquam
controversia, quid ego intellegerem, sed quid probarem. See
Appendix A, Section 3.1 of
[I-D.levkowetz-xml2rfc-v3-implementation-notes] and also
Appendix A.1 of Key Words Case Ambiguity [RFC8174] / Key Words
Case Ambiguity [RFC8174], Appendix A.1 / Key Words Case
Ambiguity [RFC8174] (Appendix A.1).
Author, et al. Expires 13 January 2019 [Page 8]
Internet-Draft Xml2rfc Vocabulary V3 Elements July 2018
4. 4.1 Sublist with '%p' parent counter element in the list
counter, item one
4.2 Item two.
4.3 Item three.
and consider what would happen if there were any text between the <li> and <ol> at
<li>
<ol type="%p%d">

@kesara kesara marked this pull request as draft July 10, 2024 00:56
@kesara
Copy link
Member Author

kesara commented Jul 10, 2024

I have converted PR to a draft.
The changes from e889f93 add a new line when with the list style to both ol and ul.
This was skipped previously.
Specifically, this adds list style + new line when li contains an artwork, figure, or sourcecode as the first child.
See how this changed test outputs in ca8286a
Note that the test output has date changes and multiple content changes propagated because of the addition of new lines.

e889f93 fixes #1138

@kesara kesara changed the title fix: Render ol items correctly fix(text): Render ol items correctly Jul 10, 2024
@kesara kesara changed the title fix(text): Render ol items correctly fix(text): Render list items correctly Sep 12, 2024
@kesara kesara added the rpat Issues needing attention from RFC Production Advisory Team label Sep 12, 2024
@kesara kesara force-pushed the fix/lists branch 2 times, most recently from b64f6f1 to ef83be5 Compare December 9, 2024 00:01
@kesara kesara marked this pull request as ready for review December 9, 2024 00:25
@kesara kesara requested a review from rjsparks December 9, 2024 00:25
@kesara
Copy link
Member Author

kesara commented Dec 17, 2024

@rjsparks
Here's diffs for RFCs that has <ul> or <ol> as first element inside a <li>1.
When empty="true" is used, this change will not affect the output of that RFC.

Footnotes

  1. https://t4.fq.nz/xml2rfc-1137/output.txt

@rjsparks
Copy link
Member

The above list is a bit confusing? I only see impact from this PRs change on rfc9599.

@kesara
Copy link
Member Author

kesara commented Dec 18, 2024

The above list is a bit confusing? I only see impact from this PRs change on rfc9599.

In most case there's empty="true" attribute set on the element or the parent level, hence no changes.

@kesara
Copy link
Member Author

kesara commented Jan 8, 2025

List of active I-Ds that match the crieria1:

Footnotes

  1. https://t4.fq.nz/xml2rfc-1137-ids/i-d-output.txt

@kesara
Copy link
Member Author

kesara commented Jan 8, 2025

@rjsparks
Copy link
Member

rjsparks commented Jan 9, 2025

Is it clear what the authors of that draft could do to get the output they originally wanted given this change?
(The source for that draft appears to be at https://github.com/loghyr/flexfilesv2)

@kesara
Copy link
Member Author

kesara commented Jan 10, 2025

Is it clear what the authors of that draft could do to get the output they originally wanted given this change? (The source for that draft appears to be at https://github.com/loghyr/flexfilesv2)

@rjsparks The original document with the issue RFC9599 has already published with the suggested workaround using <t>. Note this is RFC9599's text output will change slightly with this PR1.

This is the changes for latest draft from https://github.com/loghyr/flexfilesv2:
Screenshot 2025-01-10 at 18 38 31

Footnotes

  1. https://author-tools.ietf.org/diff?url_1=https://github.com/kesara/t4/raw/refs/heads/main/xml2rfc-1137/rfc9599.txt

@rjsparks
Copy link
Member

rjsparks commented Jan 10, 2025

so this would leave the generated .txt file for flexfilesv2 unchanged with this PR applied then:

        <li>
          <t>Delegations are assigned by the metadata server that initiates
          recalls when conflicting OPENs are processed.  Because I/O
          operations are allowed to present delegation stateids, the
          metadata server requires the ability:</t>
          <ol>
            <li>
              to make the storage device aware of the association between
              the metadata-server-chosen stateid and the filehandle and
              delegation type it represents
            </li>

            <li>
              to break such an association.
            </li>
          </ol>
        </li>

@kesara
Copy link
Member Author

kesara commented Jan 12, 2025

@rjsparks yes, that will have unchanged output.

@ajeanmahoney
Copy link
Collaborator

RFC 9599 kicked off this issue. This proposed update causes the text output not to match the HTML and PDF outputs as well as the original workaround did: https://author-tools.ietf.org/diff?url_1=https://github.com/kesara/t4/raw/refs/heads/main/xml2rfc-1137/rfc9599.txt

There were several diffs listed above that show changes to the list structure that I think the authors would not want:

IIUC, lists with empty="true" don't have changes in the list structure. However, the RPC evaluates the use of empty="true", and we may want to apply a more appropriate tag (see https://www.rfc-editor.org/pubprocess/how-we-update/). Would need to keep empty="true" to avoid some of these unwanted list changes?

Copy link
Collaborator

@ajeanmahoney ajeanmahoney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments on the PR:
#1137 (comment)

@cabo
Copy link
Contributor

cabo commented Jan 13, 2025

There were several diffs listed above that show changes to the list structure that I think the authors would not want:

I agree.
The output with these changes no longer reflects the list structure; a new item is created in the outer list just because one list item in there has an inner list.

@kesara
Copy link
Member Author

kesara commented Jan 14, 2025

RFC 9599 kicked off this issue. This proposed update causes the text output not to match the HTML and PDF outputs as well as the original workaround did: https://author-tools.ietf.org/diff?url_1=https://github.com/kesara/t4/raw/refs/heads/main/xml2rfc-1137/rfc9599.txt

There were several diffs listed above that show changes to the list structure that I think the authors would not want:

* https://author-tools.ietf.org/diff?url_1=https://www.ietf.org/archive/id/draft-haynes-nfsv4-flexfiles-v2-00.txt&url_2=https://github.com/kesara/t4/raw/main/xml2rfc-1137-ids/draft-haynes-nfsv4-flexfiles-v2-00.txt

* https://author-tools.ietf.org/diff?url_1=https://www.ietf.org/archive/id/draft-ietf-dnssd-srp-25.txt&url_2=https://github.com/kesara/t4/raw/main/xml2rfc-1137-ids/draft-ietf-dnssd-srp-25.txt

* https://author-tools.ietf.org/diff?url_1=https://www.ietf.org/archive/id/draft-ietf-rift-rift-24.txt&url_2=https://github.com/kesara/t4/raw/main/xml2rfc-1137-ids/draft-ietf-rift-rift-24.txt

* https://author-tools.ietf.org/diff?url_1=https://www.ietf.org/archive/id/draft-ietf-snac-simple-06.txt&url_2=https://github.com/kesara/t4/raw/main/xml2rfc-1137-ids/draft-ietf-snac-simple-06.txt

* https://author-tools.ietf.org/diff?url_1=https://www.ietf.org/archive/id/draft-schanzen-r5n-06.txt&url_2=https://github.com/kesara/t4/raw/main/xml2rfc-1137-ids/draft-schanzen-r5n-06.txt

IIUC, lists with empty="true" don't have changes in the list structure. However, the RPC evaluates the use of empty="true", and we may want to apply a more appropriate tag (see https://www.rfc-editor.org/pubprocess/how-we-update/). Would need to keep empty="true" to avoid some of these unwanted list changes?

@ajeanmahoney emptry="true" will only work for ul and what it does is not setting the li bullets empty. I don't see how that can avoid any changes from this PR because it's set a parent level.

@kesara
Copy link
Member Author

kesara commented Jan 14, 2025

@rjsparks, @ajeanmahoney, @cabo, I think we have 3 options here:

  1. Keep the empty line. (Current PR does this)
  2. Don't keep an empty line but add the missing list style (bullet/number/char).
  3. Don't do anything and leave the bug as it-is.

BTW xml2rfc will probably have to add an empty line with the list style, when li contains an artwork, figure, sourcecode etc. as the first child to avoid confusion.

@cabo
Copy link
Contributor

cabo commented Jan 14, 2025 via email

@rjsparks
Copy link
Member

rjsparks commented Jan 14, 2025

@cabo: from the beginning of the PR:
Fixes #1135
Fixes #1138

Note in the original report that the txt output is currently fundamentally different from the html/pdf output. If there's something to address here, it's that.

The problem in the document mentioned in #1135 was addressed the same way I suggested the flexfilesv2 doc could be repaired above, and maybe the outcome of this whole thread is "If it hurts, don't do it" with advice to put the lead in a <t> (or an empty <t> if a nested list is really a new item in the outer list. But that would leave the current issue where the txt output has a significant variance from the html unaddressed.

@alicerusso
Copy link
Collaborator

@rjsparks wrote:

maybe the outcome of this whole thread is "If it hurts, don't do it" with advice to put the lead in a (or an empty if a nested list is really a new item in the outer list.

Agree.

But that would leave the current issue where the txt output has a significant variance from the html unaddressed.

I wonder about giving a warning so that the user is aware that the input yields this variance in the output files.

@cabo
Copy link
Contributor

cabo commented Jan 15, 2025

The examples in the referenced issues all show entirely reasonable text structures that for some reason aren't shown correctly in the plaintext form.
Nothing hurts except for the bug.
Please fix the bug.

@cabo
Copy link
Contributor

cabo commented Jan 15, 2025

This is the changes for latest draft from https://github.com/loghyr/flexfilesv2:

Well, that example is bad because the text structure doesn't reflect the text content.
Indeed, the author here should remove the superfluous <li.
The renderer cannot get this right, GIGO.

@ajeanmahoney
Copy link
Collaborator

@rjsparks wrote:

maybe the outcome of this whole thread is "If it hurts, don't do it" with advice to put the lead in a (or an empty if a nested list is really a new item in the outer list.

@alicerusso Agree.

Also agree. I would have suggested a lead-in sentence to the authors as a workaround, but the workaround suggested by @kesara in #1135 also works. This is an obscure corner case for nested lists that isn't covered by the Chicago Manual of Style, which says to use your word processor's outline list functionality to format lists correctly. FWIW I tried to get MS Word and Google Docs to recreate the following text from RFC 9599 and couldn't:

   1.  approximate preservation of the presence (and therefore timing)
       of congestion marks on the L2 frames used to construct an IP
       packet;

   2.  a.  at high frequency of congestion marking, approximate
           preservation of the proportion of congestion marks arriving
           and departing;

       b.  at low frequency of congestion marking, approximate
           preservation of the timing of congestion marks arriving and
           departing.

I'm good with the workaround here.

@cabo
Copy link
Contributor

cabo commented Jan 24, 2025 via email

@ajeanmahoney
Copy link
Collaborator

@cabo Authors can create such a list with the formatting described here:
#1135 (comment)

I've called it a workaround, but it's not really. Using <t/> within <li/> isn't a hack.

@cabo
Copy link
Contributor

cabo commented Jan 24, 2025

Using <t/> within <li/> isn't a hack.

Well, allowing both mixed content (span-level) and t (block-level) sure is, avoiding the h word, interesting.
Giving them different semantics makes my mind boggle.

One problem here is that in markdown there is no difference, which is pretty much the reason why kramdown-rfc is still generating v2 for these because the v2v3 converter seems to do something that generally works.
(But not always.)

One other problem is that the content models of <t and <li are different, which creates serious headaches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rpat Issues needing attention from RFC Production Advisory Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing style in <ul> missing number in text output of <ol>
5 participants