Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce an f-specification argument in ltcmd #1525

Merged
merged 10 commits into from
Jan 23, 2025
Merged

Introduce an f-specification argument in ltcmd #1525

merged 10 commits into from
Jan 23, 2025

Conversation

josephwright
Copy link
Member

See latex3/latex3#591.

Internal housekeeping

Status of pull request

  • Feedback wanted
  • Under development
  • Ready to merge

Checklist of required changes before merge will be approved

  • Test file(s) added
  • Version and date string updated in changed source files
  • Relevant \changes entries in source included
  • Relevant changes.txt updated
  • Rollback provided (if necessary)?
  • ltnewsX.tex (and/or latexchanges.tex) updated

@josephwright josephwright marked this pull request as draft November 1, 2024 12:56
@josephwright
Copy link
Member Author

Currently this is a draft: there are some clear open questions to answer before it is ready to merge

  • Do we want to cover chars 0-32 as filecontents does?
  • Do we want a more complex approach to a preceding optional argument. For example, fancyvrb sets the catcode of ^^M prior to the \@ifnextchar to ensure it is tokenizes in a recognisable way
  • Do we want to auto-apply ! to any preceding optional arg
  • Is the letter f the best choice? I can see an argument for c ('collected'), x (charcode of b + v), w (v-squaredv), ...

base/doc/usrguide.tex Outdated Show resolved Hide resolved
@Skillmon
Copy link
Contributor

Skillmon commented Nov 2, 2024

I'd maybe drop the warning on +f and make f just behave correctly. There is no good reason not to, imho, it's also not +b for multiline bodies of environments.

@Skillmon
Copy link
Contributor

Skillmon commented Nov 2, 2024

Is it possible in the architecture of ltcmd to change the behaviour of an argument grabber based on the next argument type? If so the optional argument should be searched with the full category code regime of f in place, and if it's found the regime should be reverted, the optional argument collected, the verbatim catcode regime reactivated and the body collected. And an optional argument should have otherwise the same observable rules as it would have with a b body (so should be found even with a new line between \begin{foo} and the optional argument).

Cf. the handling of the optional argument in enverb (which as the only strange point also finds the opening bracket of the optional argument after multiple blank lines -- maybe I should revise this :)).

@Skillmon
Copy link
Contributor

Skillmon commented Nov 2, 2024

As for the argument name, what about an upper case V?

@josephwright
Copy link
Member Author

I'd maybe drop the warning on +f and make f just behave correctly. There is no good reason not to, imho, it's also not +b for multiline bodies of environments.

Thinking here is that reading the arg. spec. should be clear: collecting an environment will allow newlines, so should be +f not just f. That's why I've currently made it a warning - at a technical level, it's not an issue.

@josephwright
Copy link
Member Author

As for the argument name, what about an upper case V?

I thought about that - but the pattern we have at the moment is all uppercase specs relate to the matching lowercase in that they take an extra {<default>} argument. That's not the situation here, which is why at the moment I went with something else.

@josephwright
Copy link
Member Author

Is it possible in the architecture of ltcmd to change the behaviour of an argument grabber based on the next argument type? If so the optional argument should be searched with the full category code regime of f in place, and if it's found the regime should be reverted, the optional argument collected, the verbatim catcode regime reactivated and the body collected. And an optional argument should have otherwise the same observable rules as it would have with a b body (so should be found even with a new line between \begin{foo} and the optional argument).

Whilst we don't currently do this, it would be possible to do a 'pre-scan' to establish if special settings are needed. The issue I think is that unlike say fancyvrb or enverb, we don't know the nature of the optional argument(s) - they don't have to be delimited by [..]. So for example if someone has +d^$ +f, we need to be able to cope with that in a (somewhat) predictable way. We also have the case of multiple optional arguments, so it's all a bit more 'interesting' - in the end this could all be done, it's a question of what we want to provide and how well that fits more widely into the LaTeX landscape.

@FrankMittelbach
Copy link
Member

As for the argument name, what about an upper case V?

I thought about that - but the pattern we have at the moment is all uppercase specs relate to the matching lowercase in that they take an extra {<default>} argument. That's not the situation here, which is why at the moment I went with something else.

that pattern is strong and easy to understand so I don't think we should break it, thus that would rule V out in my opinion

@FrankMittelbach
Copy link
Member

Maybe a dumb Saturday evening idea, but what about using a special sort of processor? Right now >{...}is a processor that is applied after grabbing, but we could have <{...} for before, or less general c{named-catcode-regiment}. The you would have c{verbatim}b+ for proces the body with "verbatim" catcodes. And fancyvrb could define its own named set. The whole thing would then work also with other arg specifiers. Handwaving ...

@FrankMittelbach
Copy link
Member

Whilst we don't currently do this, it would be possible to do a 'pre-scan' to establish if special settings are needed. The issue I think is that unlike say fancyvrb or enverb, we don't know the nature of the optional argument(s) - they don't have to be delimited by [..]. So for example if someone has +d^$ +f, we need to be able to cope with that in a (somewhat) predictable way. We also have the case of multiple optional arguments, so it's all a bit more 'interesting' - in the end this could all be done, it's a question of what we want to provide and how well that fits more widely into the LaTeX landscape.

I would think this is bound to produce problems forever. Besides, what you do with respect to chars being %? Could be done, as you say but would be slow and probably riddled with errors in weird cases that we hadn't thought about.

@josephwright
Copy link
Member Author

Maybe a dumb Saturday evening idea, but what about using a special sort of processor? Right now >{...}is a processor that is applied after grabbing, but we could have <{...} for before, or less general c{named-catcode-regiment}. The you would have c{verbatim}b+ for proces the body with "verbatim" catcodes. And fancyvrb could define its own named set. The whole thing would then work also with other arg specifiers. Handwaving ...

I see the idea, but we have a few things that mean I suspect it's not quite right

  • We need to fiddle with \endlinechar
  • We may want to 'set and reset' the catcodes (see other replies)
  • It leaves finding the end point a bit awkward
  • There's the entire question of nesting

However, it does suggest something we could do - see parallel reply.

@josephwright
Copy link
Member Author

Whilst we don't currently do this, it would be possible to do a 'pre-scan' to establish if special settings are needed. The issue I think is that unlike say fancyvrb or enverb, we don't know the nature of the optional argument(s) - they don't have to be delimited by [..]. So for example if someone has +d^$ +f, we need to be able to cope with that in a (somewhat) predictable way. We also have the case of multiple optional arguments, so it's all a bit more 'interesting' - in the end this could all be done, it's a question of what we want to provide and how well that fits more widely into the LaTeX landscape.

I would think this is bound to produce problems forever. Besides, what you do with respect to chars being %? Could be done, as you say but would be slow and probably riddled with errors in weird cases that we hadn't thought about.

I think @Skillmon's approach could work, with a logic for seeking an optional arg:

  1. Set all specials and ^^M verbatim
  2. Reset the catcode of the peek token to normal
  3. Peek ahead - loop to remove spaces/newlines if needed
  4. Reset catcodes and grab the optional arg if found

This could be done by adding an additional specifier (lets say *) which means 'use a verbatim-safe approach to grabbing an optional arg'. That would allow us to avoid needed to unpick everything to decide if to use this slower approach.

@Skillmon
Copy link
Contributor

Skillmon commented Nov 3, 2024

Performance is not that bad, it's just \begingroup\expanded{\noexpand\dospecials\catcode\peekchar=\the\catcode\peekchar}\@ifnextchar\peekchar (sort of) and then, if the argument is found \endgroup and collect it, else grab the verbatim argument. And: To match current behaviour of ltcmd you should only strip the first newline (because two consecutives are normally a \par hence don't match the optional argument's start-token). And depending on the stripping of the first newline in the f-argument you need to conditionally add the ignored newline to the verbatim argument if no optional argument is found.

@josephwright
Copy link
Member Author

Performance is not that bad, it's just \begingroup\expanded{\noexpand\dospecials\catcode\peekchar=\the\catcode\peekchar}\@ifnextchar\peekchar (sort of) and then, if the argument is found \endgroup and collect it, else grab the verbatim argument. And: To match current behaviour of ltcmd you should only strip the first newline (because two consecutives are normally a \par hence don't match the optional argument's start-token). And depending on the stripping of the first newline in the f-argument you need to conditionally add the ignored newline to the verbatim argument if no optional argument is found.

I don't think we can use a group, and we have to deal with looping for spaces, but yes, I guess you are broadly right about performance.

@josephwright
Copy link
Member Author

And: To match current behaviour of ltcmd you should only strip the first newline (because two consecutives are normally a \par hence don't match the optional argument's start-token). And depending on the stripping of the first newline in the f-argument you need to conditionally add the ignored newline to the verbatim argument if no optional argument is found.

Good point about multiple ^^M. I suspect space skipping here is not desirable anyway. Perhaps then best to go with my * suggestion, which could also be applied to before v?

@josephwright
Copy link
Member Author

I'd maybe drop the warning on +f and make f just behave correctly. There is no good reason not to, imho, it's also not +b for multiline bodies of environments.

Thinking here is that reading the arg. spec. should be clear: collecting an environment will allow newlines, so should be +f not just f. That's why I've currently made it a warning - at a technical level, it's not an issue.

One might well argue that both b and f should act the same way, + or not - an easy adjustment if that's the overall agreement. If we do that, I'd normalise to one agreed form.

@Skillmon
Copy link
Contributor

Skillmon commented Nov 4, 2024

I don't think we can use a group, and we have to deal with looping for spaces, but yes, I guess you are broadly right about performance.

Of course you can use a group! You're doing right now for the f-argument as well. Again see enverb (use of \enverb@body@setup and \enverb@search@oarg). And yes, you need to loop, obviously, but it's a tight loop that only needs to check three tokens (active space, ^^M, and the optional argument's begin-token, and as I said it needs to keep track of the tokens it ignored and whether it already gobbled up a ^^M and re-add those if it doesn't find the optional argument).

Good point about multiple ^^M. I suspect space skipping here is not desirable anyway. Perhaps then best to go with my * suggestion, which could also be applied to before v?

Yes, applying * to v would then also be desirable. Though ltcmd could check for that itself I guess it's much less complicated to let the user decide and not let ltcmd peek ahead for the structure of (possibly all) following arguments.

One might well argue that both b and f should act the same way, + or not - an easy adjustment if that's the overall agreement. If we do that, I'd normalise to one agreed form.

My argumentation is that +v means multiple lines are allowed, that meaning doesn't make sense for f, as restricting an environment to a single line sounds rather odd. And +b means that b might take a \par token. That's not applicable to f as ^^M is set to other. So the only valid meaning of f vs. +f would be f not allowing empty/blank lines in the body and +f doing so. Still odd for verbatim material, but at least understandable. The current behaviour is just odd (if it still is the current behaviour, haven't taken another look at the changes).

@josephwright
Copy link
Member Author

Good point about multiple ^^M. I suspect space skipping here is not desirable anyway. Perhaps then best to go with my * suggestion, which could also be applied to before v?

Yes, applying * to v would then also be desirable. Though ltcmd could check for that itself I guess it's much less complicated to let the user decide and not let ltcmd peek ahead for the structure of (possibly all) following arguments.

I think I'll implement the suggestion, we can look at it then decide if we need automation - that would mainly sit in a different part of the code so would still require the underlying * code.

base/doc/usrguide.tex Outdated Show resolved Hide resolved
@josephwright josephwright force-pushed the verb-env branch 2 times, most recently from 79ff942 to 4600c98 Compare December 11, 2024 16:32
Copy link
Contributor

@muzimuzhi muzimuzhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some typos. Only read the non-code part.

base/doc/ltnews41.tex Outdated Show resolved Hide resolved
base/ltcmd.dtx Outdated Show resolved Hide resolved
base/ltcmd.dtx Show resolved Hide resolved
@josephwright
Copy link
Member Author

I've made two adjustments here:

  • f-type are always 'long'
  • Use @Skillmon's suggestion to grab optional arguments 'verb safe'

The second idea is partly-done. At present, you have to turn this on using a * specifier. If the mechanism looks OK, I can automate this - but probably only for an immediately-preceeding D-type argument. I'll do that, and document, once we have agreement this looks OK. See the test file for demos.

@josephwright
Copy link
Member Author

We could make life a little easier e.g. by saying that optional args before f-type have to use 'other' chars, or using a charcode peek test (more risky). But on balance, I'd rather go with an automated version of what I already have -

@josephwright
Copy link
Member Author

Other than the basic question 'is this the right letter', I think we are good-to-go here now.

@josephwright josephwright requested review from car222222, Skillmon and muzimuzhi and removed request for muzimuzhi and Skillmon January 22, 2025 10:50
Copy link
Member

@FrankMittelbach FrankMittelbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only a very shallow review. But I'm still not happy about using "f" as I wrote. Otherwise only minor comments

base/doc/usrguide.tex Outdated Show resolved Hide resolved
base/doc/usrguide.tex Outdated Show resolved Hide resolved
base/changes.txt Outdated Show resolved Hide resolved
base/ltcmd.dtx Show resolved Hide resolved
base/ltcmd.dtx Outdated
Comment on lines 3410 to 3426
% \begin{macro}{\@@_grab_f_obey_spaces:w}
% \begin{macro}{\@@_grab_f_start:n}
% \begin{macro}{\@@_grab_f_first:w}
% \begin{macro}{\@@_grab_f_loop:w}
% \begin{macro}{\@@_grab_f_auxi:w}
% \begin{macro}{\@@_grab_f_auxii:w}
% \begin{macro}{\@@_grab_f_auxiii:N}
% \begin{macro}{\@@_grab_f_auxiv:}
% \begin{macro}{\@@_grab_f_auxv:}
% \begin{macro}{\@@_grab_f_auxvi:N}
% \begin{macro}{\@@_grab_f_auxvii:}
% \begin{macro}{\@@_grab_f_auxviii:}
% \begin{macro}{\@@_grab_f_end:w}
% \begin{macro}{\@@_grab_f_end:n}
% \begin{macro}{\@@_grab_f_end_auxi:w}
% \begin{macro}{\@@_grab_f_end_auxii:w}
% \begin{macro}{\@@_grab_f_end_auxiii:w}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might actually be helpful in the documented version of the envs are put closer to the actual definitions (and small groups perhaps in some cases but not like this big lumb

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I favour the approach of 'keep all the aux functions together'. We can of course change - suggestions on a split?

Copy link
Member

@FrankMittelbach FrankMittelbach Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind much either way, but such long lists might actually create issues at page breaks and, you may end up with the margin note on one page but the actual text way later, but up to you.

base/ltcmd.dtx Outdated Show resolved Hide resolved
base/ltcmd.dtx Outdated Show resolved Hide resolved
@josephwright josephwright merged commit 9e6f3d0 into develop Jan 23, 2025
2 checks passed
@josephwright josephwright deleted the verb-env branch January 23, 2025 16:19

In this release, a new specifier~\texttt{f} is introduced, which collects the
body of an environment in a verbatim-like way. Like the existing
\texttt{v}~specification, each separate line is marked by the special
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this is only the case for +v, without the + line endings throw an error.

\texttt{v} specification, newlines are stored as \cs{obeyedline}. In a similar
fashion to the \texttt{b}~specification, by default \emph{newlines} are trimmed
at both ends of the body. Putting the prefix |!| before \texttt{f} suppresses
space-trimming.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Suggested change
space-trimming.
this trimming.

It's no space-trimming in the literal sense.

@Skillmon
Copy link
Contributor

I'll continue collecting typos here (because here I conveniently see what was added in this PR).

Copy link
Contributor

@Skillmon Skillmon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some more minor typos.

\texttt{D}~specification argument immediately before an
\texttt{c}~specification. This means that when the optional argument is absent,
the first character of the next line will be read properly. Issues may arise if
\emph{multiple} optional arguments are used before an \texttt{c}~specification,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
\emph{multiple} optional arguments are used before an \texttt{c}~specification,
\emph{multiple} optional arguments are used before a \texttt{c}~specification,

and are therefore strongly discouraged.

For technical reasons, we recommend that spaces are \emph{not} ignored when
searching for an optional argument before an \texttt{c} specification: this can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
searching for an optional argument before an \texttt{c} specification: this can
searching for an optional argument before a \texttt{c} specification: this can

Comment on lines +141 to +142
% The total number of arguments found during normalization: this is needed
% where special action is needed for the penultimate argument.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really bad, but "needed ... needed", maybe "necessary/required ... needed" instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

7 participants