Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle non-UTF-8 in quoting style #6817

Closed
RenjiSann opened this issue Oct 24, 2024 · 3 comments
Closed

Handle non-UTF-8 in quoting style #6817

RenjiSann opened this issue Oct 24, 2024 · 3 comments

Comments

@RenjiSann
Copy link
Collaborator

Right now, quoting-style uses String::from_utf8_lossy to handle the quoting of given inputs, which replaces the non UTF-8 characters with the replacement character \u{FFFD}.

It may a problem for several utils:

ls

$ touch "`echo -ne 'funky\xffname'`"
$ ls
'funky'$'\377''name'

$ ../../target/debug/ls
funky�name

cksum and hashsum checking

Here, quoting is used to print in stderr when a file is a directory or not found

$ echo -ne `SHA256` (XXX\xffXXX) = e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' > CHECKSUM

$ cksum -c CHECKSUM
cksum: 'XXX'$'\377''XXX': No such file or directory
XXXXXX: FAILED open or read
cksum: WARNING: 1 listed file could not be read

$ ../../target/debug/cksum -c CHECKSUM 
../../target/debug/cksum: XXX�XXX: No such file or directory # Here, the file should be escaped
XXXXXX: FAILED open or read
../../target/debug/cksum: WARNING: 1 listed file could not be read

There are probably some other places where correctly quoting non UTF-8 characters might prove useful.

@BenWiederhake
Copy link
Collaborator

The first part is a special case of #6639.

@RenjiSann
Copy link
Collaborator Author

The first part is a special case of #6639.

Indeed ! I am working on a fix, but this requires a big change to quoting_style.rs, because we are not working with UTF8 String anymore.
My implementation is starting to work, though not yet for the Literal quoting style, which basically does nothing, so its the only case where the output is not a valid UTF-8 string, and I have to change the quoting_style.rs API for it.

The only issue I have so far is that I am making use of slice::utf8_chunks for which I need MSRV >= 1.79, so I guess we'll have to wait.

If you want to take a look: diff

@RenjiSann
Copy link
Collaborator Author

This is fixed by #6882

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants