-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should line breaks be width 0 or 1? #60
Comments
The case for treating as width 1 is that, in a context where for whatever reason the text is forced to be displayed on a single line, most software will replace line breaks with spaces. |
For reference, I noticed this behaviour change when investigating test failures caused by the v0.1.12 → v0.1.13 update in the For example, this test is now failing: As far as I can tell, the test tries line-wrapping this string into 3 columns:
The expected result seems to be that there is no change when line-wrapping this to width 3 (since all lines already are width 3), but since newline characters now contribute an additional +1 to the length, the result is completely mangled. EDIT: Maybe this logic for line wrapping was fishy all along, but it worked with how unicode-width determined the displayed widths. When counting |
Coming here with the same issue, the update from 0.1.12 -> 0.1.13 broke code that relied on newline characters having width = 0. It should at least have been a minor version update, since the API contract was broken.
Then surely it's a different string with different width, no? |
That's exceedingly rare, though. Personally I think the case for the math getting easier especially near existing line break opportunities is compelling enough for it to be 0.
No, this is about rendering. When an emoji gets represented as a bunch of tofu that is not a "different string", that is just a rendering choice, and this crate cannot predict all rendering choices. It just makes a best effort.
Yes, I think this logic is not using this crate correctly and just happened to work. However I think that this kind of pattern should mostly work pleasantly, ideally. |
Not really, no? Most any sort of search box will behave like this, for example. That being said, if the breakage really is widespread, it may be worth reverting to width 0 for 0.1.x and only doing it for 0.2.x. |
My point is about code structure and not rendering, if you have Concretely, I have to change code from
to
As a result of the change. And this is in line wrapping logic, like you mention. |
Not a terminal, though? And that's an actual text stream transform that happens: it is turned into a space, which makes it the "different string" thing that @m-hilgendorf talks about. |
@m-hilgendorf I assumed @Jules-Bertholet was alluding to the cases where some software will render an unwrapped linebreak as a space or a control character. Rare but happens, that is about rendering. However he's actually talking about cases where the text stream is replaced, where I agree that that should be treated as a replacement. |
Is this library intended to be terminal-specific? If it is, there are many other things we should be doing differently. (And in any case, a TUI could perfectly well have such an interface)
That's one possible implementation, but not the only one. Merely changing how the line-breaking character is rendered would also be perfectly reasonable. (Rendering it as zero-width would not be reasonable, however.) As for line-wrapping, AIUI the extra logic needed to handle newlines is just an extension of the work needed to handle end-of-line spaces. For example, wrapping |
Not exactly, but it's more used that way.
Yes, but this behavior is most commonly seen in the wild as a text stream replacement. The choice of implementation has other implications, it's not purely internal. I have seen the "treat linebreak as a 1-width character" behavior before without it being a text stream replacement but it's incredibly rare IME. |
In my testing, GTK and QT both seem to do it this way, so not that rare I would think? |
Are they actually doing that or performing a text stream replacement on pasting? |
The text stream is not replaced, no—copying back out of the field preserves the newline. (GTK shows a visible arrow glyph, QT a blank space) |
Hmm, interesting. That does present a legitimate dilemma here. |
This replacement only happens in single-line input fields though, right? So the line break needs to be replaced with something, or things will look broken. That's not really a fair comparison to the line-wrapping use case IMO. |
One more point in favor of line breaks being width 1, is that nearly all applications will display them as having advance width when they are selected. It's true that, for the purpose of determining line widths for line wrapping, line break characters should be treated as having width 0, not 1, and therefore line-wrapping logic needs to special-case them if we keep the present assignments. But, as I mention above, they are not the only kind of character that line-wrapping logic might need to special case. I mention word-separating whitespace above; the soft hyphen is another example. Programs performing line-wrapping will differ in how exactly they handle these characters, just as they will differ in which line-breaks they recognize (out of |
That's relatively convincing. Perhaps we should document this explicitly; that applications shouldn't feed this crate strings with newlines unless they are ok with them being treated as being on the same line with width 1. |
Ok, so what should crates that do line-wrapping do instead? Replace newlines with spaces before wrapping at a specified width, and treat input string as a single-line string? What about line-wrapping methods that preserve existing line breaks? Both of these will likely require more complicated logic and / or data copying than before. |
@decathorpe process text one line at a time, splitting into lines first. I wouldn't actually replace anything, the idea would be to never ask this crate for computing the width of anything with a newline. The statefullness of this crate does not extend past newlines so if you still need to add them up afterwards you can. I don't think this would be an extra allocation/perf cost/copying, it will complicate the logic a little bit. |
But as far as I can tell for line wrapping you need to pay attention to the newlines anyway |
See #55 (comment)
From Unicode:
Newline control characters (and other break characters) are typically "displayable by normal rendering", however they do not have a rendering that has anything that could be termed as a "width", so there's no single answer as to what this crate should provide here according to the spec.
Applications using
unicode-width
ought to be using a higher level protocol to split text on newlines (and other similar things) before feeding it to this crate (and perhaps we should document that). It isn't really possible for this crate to provide such a protocol itself since there are nuances to how different contexts handle newlines (Windows vs Unix being the most obvious one, but wrapping behavior also participates).However, we should pick some consistent, sensible default for what newlines are treated as. My guess is that we should treat them as width 0 (for both CR and LF, and any other line breaks that exist).
h/t/ @decathorpe for noticing
cc @Jules-Bertholet
The text was updated successfully, but these errors were encountered: