Introduce an f-specification argument in ltcmd #1525

josephwright · 2024-11-01T12:56:11Z

See latex3/latex3#591.

Internal housekeeping

Status of pull request

Feedback wanted
Under development
Ready to merge

Checklist of required changes before merge will be approved

Test file(s) added
Version and date string updated in changed source files
Relevant \changes entries in source included
Relevant changes.txt updated
Rollback provided (if necessary)?
ltnewsX.tex (and/or latexchanges.tex) updated

josephwright · 2024-11-01T12:59:06Z

Currently this is a draft: there are some clear open questions to answer before it is ready to merge

Do we want to cover chars 0-32 as filecontents does?
Do we want a more complex approach to a preceding optional argument. For example, fancyvrb sets the catcode of ^^M prior to the \@ifnextchar to ensure it is tokenizes in a recognisable way
Do we want to auto-apply ! to any preceding optional arg
Is the letter f the best choice? I can see an argument for c ('collected'), x (charcode of b + v), w (v-squaredv), ...

base/doc/usrguide.tex

Skillmon · 2024-11-02T12:40:11Z

I'd maybe drop the warning on +f and make f just behave correctly. There is no good reason not to, imho, it's also not +b for multiline bodies of environments.

Skillmon · 2024-11-02T13:11:16Z

Is it possible in the architecture of ltcmd to change the behaviour of an argument grabber based on the next argument type? If so the optional argument should be searched with the full category code regime of f in place, and if it's found the regime should be reverted, the optional argument collected, the verbatim catcode regime reactivated and the body collected. And an optional argument should have otherwise the same observable rules as it would have with a b body (so should be found even with a new line between \begin{foo} and the optional argument).

Cf. the handling of the optional argument in enverb (which as the only strange point also finds the opening bracket of the optional argument after multiple blank lines -- maybe I should revise this :)).

Skillmon · 2024-11-02T13:15:40Z

As for the argument name, what about an upper case V?

josephwright · 2024-11-02T17:04:13Z

I'd maybe drop the warning on +f and make f just behave correctly. There is no good reason not to, imho, it's also not +b for multiline bodies of environments.

Thinking here is that reading the arg. spec. should be clear: collecting an environment will allow newlines, so should be +f not just f. That's why I've currently made it a warning - at a technical level, it's not an issue.

josephwright · 2024-11-02T17:08:13Z

As for the argument name, what about an upper case V?

I thought about that - but the pattern we have at the moment is all uppercase specs relate to the matching lowercase in that they take an extra {<default>} argument. That's not the situation here, which is why at the moment I went with something else.

josephwright · 2024-11-02T17:16:31Z

Is it possible in the architecture of ltcmd to change the behaviour of an argument grabber based on the next argument type? If so the optional argument should be searched with the full category code regime of f in place, and if it's found the regime should be reverted, the optional argument collected, the verbatim catcode regime reactivated and the body collected. And an optional argument should have otherwise the same observable rules as it would have with a b body (so should be found even with a new line between \begin{foo} and the optional argument).

Whilst we don't currently do this, it would be possible to do a 'pre-scan' to establish if special settings are needed. The issue I think is that unlike say fancyvrb or enverb, we don't know the nature of the optional argument(s) - they don't have to be delimited by [..]. So for example if someone has +d^$ +f, we need to be able to cope with that in a (somewhat) predictable way. We also have the case of multiple optional arguments, so it's all a bit more 'interesting' - in the end this could all be done, it's a question of what we want to provide and how well that fits more widely into the LaTeX landscape.

FrankMittelbach · 2024-11-02T17:53:46Z

As for the argument name, what about an upper case V?

I thought about that - but the pattern we have at the moment is all uppercase specs relate to the matching lowercase in that they take an extra {<default>} argument. That's not the situation here, which is why at the moment I went with something else.

that pattern is strong and easy to understand so I don't think we should break it, thus that would rule V out in my opinion

FrankMittelbach · 2024-11-02T18:10:53Z

Maybe a dumb Saturday evening idea, but what about using a special sort of processor? Right now >{...}is a processor that is applied after grabbing, but we could have <{...} for before, or less general c{named-catcode-regiment}. The you would have c{verbatim}b+ for proces the body with "verbatim" catcodes. And fancyvrb could define its own named set. The whole thing would then work also with other arg specifiers. Handwaving ...

FrankMittelbach · 2024-11-02T18:20:14Z

Whilst we don't currently do this, it would be possible to do a 'pre-scan' to establish if special settings are needed. The issue I think is that unlike say fancyvrb or enverb, we don't know the nature of the optional argument(s) - they don't have to be delimited by [..]. So for example if someone has +d^$ +f, we need to be able to cope with that in a (somewhat) predictable way. We also have the case of multiple optional arguments, so it's all a bit more 'interesting' - in the end this could all be done, it's a question of what we want to provide and how well that fits more widely into the LaTeX landscape.

I would think this is bound to produce problems forever. Besides, what you do with respect to chars being %? Could be done, as you say but would be slow and probably riddled with errors in weird cases that we hadn't thought about.

josephwright · 2024-11-03T14:20:10Z

Maybe a dumb Saturday evening idea, but what about using a special sort of processor? Right now >{...}is a processor that is applied after grabbing, but we could have <{...} for before, or less general c{named-catcode-regiment}. The you would have c{verbatim}b+ for proces the body with "verbatim" catcodes. And fancyvrb could define its own named set. The whole thing would then work also with other arg specifiers. Handwaving ...

I see the idea, but we have a few things that mean I suspect it's not quite right

We need to fiddle with \endlinechar
We may want to 'set and reset' the catcodes (see other replies)
It leaves finding the end point a bit awkward
There's the entire question of nesting

However, it does suggest something we could do - see parallel reply.

josephwright · 2024-11-03T14:22:28Z

Whilst we don't currently do this, it would be possible to do a 'pre-scan' to establish if special settings are needed. The issue I think is that unlike say fancyvrb or enverb, we don't know the nature of the optional argument(s) - they don't have to be delimited by [..]. So for example if someone has +d^$ +f, we need to be able to cope with that in a (somewhat) predictable way. We also have the case of multiple optional arguments, so it's all a bit more 'interesting' - in the end this could all be done, it's a question of what we want to provide and how well that fits more widely into the LaTeX landscape.

I would think this is bound to produce problems forever. Besides, what you do with respect to chars being %? Could be done, as you say but would be slow and probably riddled with errors in weird cases that we hadn't thought about.

I think @Skillmon's approach could work, with a logic for seeking an optional arg:

Set all specials and ^^M verbatim
Reset the catcode of the peek token to normal
Peek ahead - loop to remove spaces/newlines if needed
Reset catcodes and grab the optional arg if found

This could be done by adding an additional specifier (lets say *) which means 'use a verbatim-safe approach to grabbing an optional arg'. That would allow us to avoid needed to unpick everything to decide if to use this slower approach.

Skillmon · 2024-11-03T19:01:05Z

Performance is not that bad, it's just \begingroup\expanded{\noexpand\dospecials\catcode\peekchar=\the\catcode\peekchar}\@ifnextchar\peekchar (sort of) and then, if the argument is found \endgroup and collect it, else grab the verbatim argument. And: To match current behaviour of ltcmd you should only strip the first newline (because two consecutives are normally a \par hence don't match the optional argument's start-token). And depending on the stripping of the first newline in the f-argument you need to conditionally add the ignored newline to the verbatim argument if no optional argument is found.

josephwright · 2024-11-03T21:56:17Z

Performance is not that bad, it's just \begingroup\expanded{\noexpand\dospecials\catcode\peekchar=\the\catcode\peekchar}\@ifnextchar\peekchar (sort of) and then, if the argument is found \endgroup and collect it, else grab the verbatim argument. And: To match current behaviour of ltcmd you should only strip the first newline (because two consecutives are normally a \par hence don't match the optional argument's start-token). And depending on the stripping of the first newline in the f-argument you need to conditionally add the ignored newline to the verbatim argument if no optional argument is found.

I don't think we can use a group, and we have to deal with looping for spaces, but yes, I guess you are broadly right about performance.

josephwright · 2024-11-04T08:48:21Z

And: To match current behaviour of ltcmd you should only strip the first newline (because two consecutives are normally a \par hence don't match the optional argument's start-token). And depending on the stripping of the first newline in the f-argument you need to conditionally add the ignored newline to the verbatim argument if no optional argument is found.

Good point about multiple ^^M. I suspect space skipping here is not desirable anyway. Perhaps then best to go with my * suggestion, which could also be applied to before v?

josephwright · 2024-11-04T08:49:14Z

I'd maybe drop the warning on +f and make f just behave correctly. There is no good reason not to, imho, it's also not +b for multiline bodies of environments.

Thinking here is that reading the arg. spec. should be clear: collecting an environment will allow newlines, so should be +f not just f. That's why I've currently made it a warning - at a technical level, it's not an issue.

One might well argue that both b and f should act the same way, + or not - an easy adjustment if that's the overall agreement. If we do that, I'd normalise to one agreed form.

Skillmon · 2024-11-04T18:27:59Z

I don't think we can use a group, and we have to deal with looping for spaces, but yes, I guess you are broadly right about performance.

Of course you can use a group! You're doing right now for the f-argument as well. Again see enverb (use of \enverb@body@setup and \enverb@search@oarg). And yes, you need to loop, obviously, but it's a tight loop that only needs to check three tokens (active space, ^^M, and the optional argument's begin-token, and as I said it needs to keep track of the tokens it ignored and whether it already gobbled up a ^^M and re-add those if it doesn't find the optional argument).

Good point about multiple ^^M. I suspect space skipping here is not desirable anyway. Perhaps then best to go with my * suggestion, which could also be applied to before v?

Yes, applying * to v would then also be desirable. Though ltcmd could check for that itself I guess it's much less complicated to let the user decide and not let ltcmd peek ahead for the structure of (possibly all) following arguments.

One might well argue that both b and f should act the same way, + or not - an easy adjustment if that's the overall agreement. If we do that, I'd normalise to one agreed form.

My argumentation is that +v means multiple lines are allowed, that meaning doesn't make sense for f, as restricting an environment to a single line sounds rather odd. And +b means that b might take a \par token. That's not applicable to f as ^^M is set to other. So the only valid meaning of f vs. +f would be f not allowing empty/blank lines in the body and +f doing so. Still odd for verbatim material, but at least understandable. The current behaviour is just odd (if it still is the current behaviour, haven't taken another look at the changes).

josephwright · 2024-11-06T10:24:31Z

Good point about multiple ^^M. I suspect space skipping here is not desirable anyway. Perhaps then best to go with my * suggestion, which could also be applied to before v?

Yes, applying * to v would then also be desirable. Though ltcmd could check for that itself I guess it's much less complicated to let the user decide and not let ltcmd peek ahead for the structure of (possibly all) following arguments.

I think I'll implement the suggestion, we can look at it then decide if we need automation - that would mainly sit in a different part of the code so would still require the underlying * code.

base/doc/usrguide.tex

muzimuzhi

Some typos. Only read the non-code part.

base/doc/ltnews41.tex

base/ltcmd.dtx

josephwright · 2024-12-12T09:11:08Z

I've made two adjustments here:

f-type are always 'long'
Use @Skillmon's suggestion to grab optional arguments 'verb safe'

The second idea is partly-done. At present, you have to turn this on using a * specifier. If the mechanism looks OK, I can automate this - but probably only for an immediately-preceeding D-type argument. I'll do that, and document, once we have agreement this looks OK. See the test file for demos.

josephwright · 2024-12-12T09:55:55Z

We could make life a little easier e.g. by saying that optional args before f-type have to use 'other' chars, or using a charcode peek test (more risky). But on balance, I'd rather go with an automated version of what I already have -

josephwright · 2025-01-22T10:49:59Z

Other than the basic question 'is this the right letter', I think we are good-to-go here now.

FrankMittelbach

only a very shallow review. But I'm still not happy about using "f" as I wrote. Otherwise only minor comments

base/doc/usrguide.tex

base/changes.txt

base/ltcmd.dtx

FrankMittelbach · 2025-01-22T16:23:02Z

base/ltcmd.dtx

+% \begin{macro}{\@@_grab_f_obey_spaces:w}
+% \begin{macro}{\@@_grab_f_start:n}
+% \begin{macro}{\@@_grab_f_first:w}
+% \begin{macro}{\@@_grab_f_loop:w}
+% \begin{macro}{\@@_grab_f_auxi:w}
+% \begin{macro}{\@@_grab_f_auxii:w}
+% \begin{macro}{\@@_grab_f_auxiii:N}
+% \begin{macro}{\@@_grab_f_auxiv:}
+% \begin{macro}{\@@_grab_f_auxv:}
+% \begin{macro}{\@@_grab_f_auxvi:N}
+% \begin{macro}{\@@_grab_f_auxvii:}
+% \begin{macro}{\@@_grab_f_auxviii:}
+% \begin{macro}{\@@_grab_f_end:w}
+% \begin{macro}{\@@_grab_f_end:n}
+% \begin{macro}{\@@_grab_f_end_auxi:w}
+% \begin{macro}{\@@_grab_f_end_auxii:w}
+% \begin{macro}{\@@_grab_f_end_auxiii:w}


might actually be helpful in the documented version of the envs are put closer to the actual definitions (and small groups perhaps in some cases but not like this big lumb

I guess I favour the approach of 'keep all the aux functions together'. We can of course change - suggestions on a split?

I don't mind much either way, but such long lists might actually create issues at page breaks and, you may end up with the margin note on one page but the actual text way later, but up to you.

base/ltcmd.dtx

Skillmon · 2025-01-22T20:58:36Z

base/doc/ltnews41.tex

+
+In this release, a new specifier~\texttt{f} is introduced, which collects the
+body of an environment in a verbatim-like way. Like the existing
+\texttt{v}~specification, each separate line is marked by the special


Technically this is only the case for +v, without the + line endings throw an error.

Skillmon · 2025-01-22T21:24:05Z

base/doc/usrguide.tex

+\texttt{v} specification, newlines are stored as \cs{obeyedline}. In a similar
+fashion to the \texttt{b}~specification, by default \emph{newlines} are trimmed
+at both ends of the body. Putting the prefix |!| before \texttt{f} suppresses
+space-trimming.


Maybe

Suggested change

space-trimming.

this trimming.

It's no space-trimming in the literal sense.

Skillmon · 2025-01-23T20:18:35Z

I'll continue collecting typos here (because here I conveniently see what was added in this PR).

Skillmon

Only some more minor typos.

Skillmon · 2025-01-23T20:19:43Z

base/doc/usrguide.tex

+\texttt{D}~specification argument immediately before an
+\texttt{c}~specification. This means that when the optional argument is absent,
+the first character of the next line will be read properly. Issues may arise if
+\emph{multiple} optional arguments are used before an \texttt{c}~specification,


Suggested change

\emph{multiple} optional arguments are used before an \texttt{c}~specification,

\emph{multiple} optional arguments are used before a \texttt{c}~specification,

Skillmon · 2025-01-23T20:20:28Z

base/doc/usrguide.tex

+and are therefore strongly discouraged.
+
+For technical reasons, we recommend that spaces are \emph{not} ignored when
+searching for an optional argument before an \texttt{c} specification: this can


Suggested change

searching for an optional argument before an \texttt{c} specification: this can

searching for an optional argument before a \texttt{c} specification: this can

Skillmon · 2025-01-23T20:25:48Z

base/ltcmd.dtx

+%   The total number of arguments found during normalization: this is needed
+%   where special action is needed for the penultimate argument.


Not really bad, but "needed ... needed", maybe "necessary/required ... needed" instead.

josephwright marked this pull request as draft November 1, 2024 12:56

Skillmon reviewed Nov 2, 2024

View reviewed changes

base/doc/usrguide.tex Outdated Show resolved Hide resolved

josephwright force-pushed the verb-env branch from 88e37f9 to d55f3ba Compare November 2, 2024 16:42

josephwright force-pushed the verb-env branch from d55f3ba to 4193145 Compare December 11, 2024 12:54

lvjr reviewed Dec 11, 2024

View reviewed changes

base/doc/usrguide.tex Outdated Show resolved Hide resolved

josephwright force-pushed the verb-env branch 2 times, most recently from 79ff942 to 4600c98 Compare December 11, 2024 16:32

muzimuzhi reviewed Dec 11, 2024

View reviewed changes

base/doc/ltnews41.tex Outdated Show resolved Hide resolved

base/ltcmd.dtx Outdated Show resolved Hide resolved

base/ltcmd.dtx Show resolved Hide resolved

josephwright force-pushed the verb-env branch from 4600c98 to c152931 Compare December 12, 2024 09:09

josephwright requested a review from Skillmon December 12, 2024 10:06

josephwright force-pushed the verb-env branch from ca0d87c to 537a86f Compare January 22, 2025 10:40

josephwright requested review from Skillmon and FrankMittelbach January 22, 2025 10:49

josephwright requested review from car222222, Skillmon and muzimuzhi and removed request for muzimuzhi and Skillmon January 22, 2025 10:50

FrankMittelbach reviewed Jan 22, 2025

View reviewed changes

lvjr reviewed Jan 23, 2025

View reviewed changes

base/ltcmd.dtx Outdated Show resolved Hide resolved

lvjr reviewed Jan 23, 2025

View reviewed changes

base/ltcmd.dtx Outdated Show resolved Hide resolved

josephwright force-pushed the verb-env branch from 31009a3 to 84c6669 Compare January 23, 2025 12:28

josephwright added 10 commits January 23, 2025 16:18

Auto-generate D-type grabbers

169b0fc

Introduce an f-specification argument in ltcmd

c1dfe0f

Allow text before but not after \end{foo} for f-type env

6dc65e2

Allow for spaces in "\end{foo}"

801c106

Warn for text on the first line for f-type env

a5ee670

Automate verb-protection for e.g. "o f" env. spec.

0234a0a

Re-phrase a paragraph

89485c7

Correct usrguide text

9220df6

Switch from "f" to "c" for verb-env-collection

549b748

Split macro markup

8e1bb89

josephwright force-pushed the verb-env branch from 3b6bc66 to 8e1bb89 Compare January 23, 2025 16:19

josephwright merged commit 9e6f3d0 into develop Jan 23, 2025
2 checks passed

josephwright deleted the verb-env branch January 23, 2025 16:19

josephwright mentioned this pull request Jan 23, 2025

[xparse] Provide verbatim environment latex3/latex3#591

Closed

Skillmon reviewed Jan 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce an f-specification argument in ltcmd #1525

Introduce an f-specification argument in ltcmd #1525

josephwright commented Nov 1, 2024

josephwright commented Nov 1, 2024

Skillmon commented Nov 2, 2024

Skillmon commented Nov 2, 2024

Skillmon commented Nov 2, 2024

josephwright commented Nov 2, 2024

josephwright commented Nov 2, 2024

josephwright commented Nov 2, 2024

FrankMittelbach commented Nov 2, 2024

FrankMittelbach commented Nov 2, 2024

FrankMittelbach commented Nov 2, 2024

josephwright commented Nov 3, 2024

josephwright commented Nov 3, 2024

Skillmon commented Nov 3, 2024

josephwright commented Nov 3, 2024

josephwright commented Nov 4, 2024

josephwright commented Nov 4, 2024

Skillmon commented Nov 4, 2024

josephwright commented Nov 6, 2024

muzimuzhi left a comment

josephwright commented Dec 12, 2024

josephwright commented Dec 12, 2024

josephwright commented Jan 22, 2025

FrankMittelbach left a comment

FrankMittelbach Jan 22, 2025

josephwright Jan 23, 2025

FrankMittelbach Jan 23, 2025 •

edited

Loading

Skillmon Jan 22, 2025

Skillmon Jan 22, 2025

Skillmon commented Jan 23, 2025

Skillmon left a comment

Skillmon Jan 23, 2025

Skillmon Jan 23, 2025

Skillmon Jan 23, 2025

	\emph{multiple} optional arguments are used before an \texttt{c}~specification,
	\emph{multiple} optional arguments are used before a \texttt{c}~specification,

	searching for an optional argument before an \texttt{c} specification: this can
	searching for an optional argument before a \texttt{c} specification: this can

		% The total number of arguments found during normalization: this is needed
		% where special action is needed for the penultimate argument.

Introduce an f-specification argument in ltcmd #1525

Introduce an f-specification argument in ltcmd #1525

Conversation

josephwright commented Nov 1, 2024

Internal housekeeping

Status of pull request

Checklist of required changes before merge will be approved

josephwright commented Nov 1, 2024

Skillmon commented Nov 2, 2024

Skillmon commented Nov 2, 2024

Skillmon commented Nov 2, 2024

josephwright commented Nov 2, 2024

josephwright commented Nov 2, 2024

josephwright commented Nov 2, 2024

FrankMittelbach commented Nov 2, 2024

FrankMittelbach commented Nov 2, 2024

FrankMittelbach commented Nov 2, 2024

josephwright commented Nov 3, 2024

josephwright commented Nov 3, 2024

Skillmon commented Nov 3, 2024

josephwright commented Nov 3, 2024

josephwright commented Nov 4, 2024

josephwright commented Nov 4, 2024

Skillmon commented Nov 4, 2024

josephwright commented Nov 6, 2024

muzimuzhi left a comment

Choose a reason for hiding this comment

josephwright commented Dec 12, 2024

josephwright commented Dec 12, 2024

josephwright commented Jan 22, 2025

FrankMittelbach left a comment

Choose a reason for hiding this comment

FrankMittelbach Jan 22, 2025

Choose a reason for hiding this comment

josephwright Jan 23, 2025

Choose a reason for hiding this comment

FrankMittelbach Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Skillmon Jan 22, 2025

Choose a reason for hiding this comment

Skillmon Jan 22, 2025

Choose a reason for hiding this comment

Skillmon commented Jan 23, 2025

Skillmon left a comment

Choose a reason for hiding this comment

Skillmon Jan 23, 2025

Choose a reason for hiding this comment

Skillmon Jan 23, 2025

Choose a reason for hiding this comment

Skillmon Jan 23, 2025

Choose a reason for hiding this comment

FrankMittelbach Jan 23, 2025 •

edited

Loading