Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse UTF-16 surrogates pairs for calculating pattern's position #11915

Merged

Conversation

comzyh
Copy link
Contributor

@comzyh comzyh commented Dec 9, 2021

Summary of the Pull Request

Properly handle UTF-16 surrogates when calculating the position of matched pattern.

Fix #8709

References

const auto charData = Utf16Parser::Parse(wstr);
std::vector<std::vector<wchar_t>> cells;
for (const auto chars : charData)
{
if (IsGlyphFullWidth(std::wstring_view{ chars.data(), chars.size() }))

PR Checklist

  • Closes Links/URLs get offset when certain unicode characters are printed #8709
  • CLA signed. If not, go over here and sign the CLA
  • Tests added/passed
  • Documentation updated. If checked, please file a pull request on our docs repo and link it here: #xxx
  • Schema updated.
  • I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

Detailed Description of the Pull Request / Additional comments

use Utf16Parser::Parse to handle code points from U+010000 to U+10FFFF in UTF-16.

Validation Steps Performed

image

also the case by @mas90 #8709 (comment):

image

@ghost ghost added Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal. labels Dec 9, 2021
@comzyh comzyh changed the title parse UTF-16 surrogates pair for calculating pattern position Parse UTF-16 surrogates pairs for calculating pattern's position Dec 9, 2021
Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

@zadjii-msft
Copy link
Member

@msftbot make sure @PankajBhojwani signs off on this one

@ghost ghost added the AutoMerge Marked for automatic merge by the bot when requirements are met label Dec 9, 2021
@ghost
Copy link

ghost commented Dec 9, 2021

Hello @zadjii-msft!

Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:

If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you".

@@ -2585,16 +2586,17 @@ PointTree TextBuffer::GetPatterns(const size_t firstRow, const size_t lastRow) c
// match and the previous match, so we use the size of the prefix
// along with the size of the match to determine the locations
size_t prefixSize = 0;

for (const auto ch : i->prefix().str())
for (const std::vector<wchar_t> parsedGlyph : Utf16Parser::Parse(i->prefix().str()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have left auto for the left side here and below... but I'm not going to block over it.

@zadjii-msft
Copy link
Member

@msftbot merge this in 1 minute

@ghost
Copy link

ghost commented Dec 9, 2021

Hello @zadjii-msft!

Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:

  • I'll only merge this pull request if it's approved by @PankajBhojwani
  • I won't merge this pull request until after the UTC date Thu, 09 Dec 2021 18:41:11 GMT, which is in 1 minute

If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you".

@ghost ghost merged commit a2d96d6 into microsoft:main Dec 9, 2021
DHowett pushed a commit that referenced this pull request Dec 13, 2021
)

<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request

Properly handle UTF-16 surrogates when calculating the position of matched pattern.

Fix #8709

<!-- Other than the issue solved, is this relevant to any other issues/existing PRs? -->
## References
https://github.com/microsoft/terminal/blob/b88ffb21b0725331877ba76bac5a79a4c21eaa03/src/buffer/out/search.cpp#L335-L339

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [ ] Closes #8709
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [ ] Tests added/passed
* [ ] Documentation updated. If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/terminal) and link it here: #xxx
* [ ] Schema updated.
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments
use `Utf16Parser::Parse` to handle code points from U+010000 to U+10FFFF in UTF-16.

<!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well -->
## Validation Steps Performed

![image](https://user-images.githubusercontent.com/1068203/145421736-c842c7d4-0136-42d0-ad72-f004f58d9e3b.png)

also the case by @mas90  #8709 (comment):

![image](https://user-images.githubusercontent.com/1068203/145420264-3fe220b4-42c5-44ac-aa94-4e604b164ed3.png)

(cherry picked from commit a2d96d6)
DHowett pushed a commit that referenced this pull request Dec 13, 2021
)

<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request

Properly handle UTF-16 surrogates when calculating the position of matched pattern.

Fix #8709

<!-- Other than the issue solved, is this relevant to any other issues/existing PRs? -->
## References
https://github.com/microsoft/terminal/blob/b88ffb21b0725331877ba76bac5a79a4c21eaa03/src/buffer/out/search.cpp#L335-L339

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [ ] Closes #8709
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [ ] Tests added/passed
* [ ] Documentation updated. If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/terminal) and link it here: #xxx
* [ ] Schema updated.
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments
use `Utf16Parser::Parse` to handle code points from U+010000 to U+10FFFF in UTF-16.

<!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well -->
## Validation Steps Performed

![image](https://user-images.githubusercontent.com/1068203/145421736-c842c7d4-0136-42d0-ad72-f004f58d9e3b.png)

also the case by @mas90  #8709 (comment):

![image](https://user-images.githubusercontent.com/1068203/145420264-3fe220b4-42c5-44ac-aa94-4e604b164ed3.png)

(cherry picked from commit a2d96d6)
@ghost
Copy link

ghost commented Dec 14, 2021

🎉Windows Terminal Preview v1.12.3472.0 has been released which incorporates this pull request.:tada:

Handy links:

@ghost
Copy link

ghost commented Dec 14, 2021

🎉Windows Terminal v1.11.3471.0 has been released which incorporates this pull request.:tada:

Handy links:

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Links/URLs get offset when certain unicode characters are printed
4 participants