Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporting data items with forward slashes results in wrong filename #591

Closed
8 tasks
cmeyer opened this issue Jan 16, 2021 · 9 comments · Fixed by #1050
Closed
8 tasks

Exporting data items with forward slashes results in wrong filename #591

cmeyer opened this issue Jan 16, 2021 · 9 comments · Fixed by #1050
Labels
audience - user affects users effort - day a day or less f - IO f - organization good first issue good for newcomers impact - medium has straightforward but non-obvious workaround priority - medium automated priority tag reach - small affects a few users weekly type - bug

Comments

@cmeyer
Copy link
Collaborator

cmeyer commented Jan 16, 2021

Forward slashes aren't allowed in Windows filenames.

Exporting a single file with forward slashes works on macOS. It is not allowed (some versions of Qt?) on Windows.

Exporting a batch with forward slashes in the prefix silently fails on both macOS and Windows.

Also, how are backslashes and other disallowed characters handled?

Is there any way to escape a filename and remove disallowed characters that works cross platform?


Edit 2024-05-14: This issue only applies to batch exports or possibly the default file name used when exporting a single file. In those places, we should ensure that if we generate a filename from a title, it is sanitized for use as a filename. See #874 for some initial, abandoned work. We can assume that a filename the user chooses a filename in a save file dialog is sanitized; it may still fail but will be reported to the user.


Tasks

Preview Give feedback
@cmeyer
Copy link
Collaborator Author

cmeyer commented Nov 2, 2021

My initial thought is that the path should be checked and cleaned up around here:

ImportExportManager.ImportExportManager().write_display_item_with_writer(selected_writer, display_item, pathlib.Path(path))

@gosselind1
Copy link
Contributor

I've dug a bit through this one. Addressing the big question of if there is an easy cross-platform way to sanitize paths, the answer is no.

Though there are some attempts at resolving this, ex: the first answer of https://stackoverflow.com/questions/9532499/check-whether-a-path-is-valid-in-python-without-creating-a-file-at-the-paths-ta, not only does it note it has issues, but it also misses issues such as restricted file names, ex: "NUL", along with not even considering UNC paths on the windows side. (As an additional note, most OSs will prevent or attempt to prevent invalid or disliked file names within their graphical and text shells for the most part. Windows for example will not let you use a file with "/" in it when utilizing the explorer selection window.)

That said, there are some mentioned libraries, along with the suggestion that one could just attempt to create the file and see what happens. I dislike the first option as I have no clue how well maintained another library might be going into the future. The second option seems reasonable enough though, especially since we're trying to write a file at that destination anyways. If that fails I could then have swift present the user with a notification saying the export process failed due to X lower-level python file error.

As for the slashes, python does have a way to resolve a path, and expand it to a full-length path, which would be how I solve the slashes in prefix problem (which also converts slashes to the required slash for the host OS). This would effectively make any path with a slash spawn an additional sub-directory (or fail to spawn one) depending on desired behavior.

@cmeyer
Copy link
Collaborator Author

cmeyer commented Jul 25, 2022

I think the best approach is to start with something restrictive and add exceptions/restrictions as necessary. And write a unit test (hopefully just one easy to understand test function) that passes on all platforms.

def simplify_filename(value: str) -> str:
    # remove characters that aren't alphanumerics, underscores, or hyphens
    # strip leading/trailing whitespace
    # similar to django/slugify:
    # https://github.com/django/django/blob/main/django/utils/text.py
    value = unicodedata.normalize("NFKC", value)
    # only allow alphanumerics, underscores, hyphens
    value = re.sub(r"[^\w\s-]", "", value)
    # replace one or multiple whitespace characters with a single space and strip from start/end
    return re.sub(r"[\s]+", " ", value).strip()

@jamesrussell216
Copy link

Quick question on this one @cmeyer - is there a use case where having a special character in a file name is needed? For instance MS Word just brings up a simple error message if you try to name a file with a special character and makes you redo it, could/should we just do the same?

image

@ndellby
Copy link

ndellby commented May 17, 2024 via email

@cmeyer
Copy link
Collaborator Author

cmeyer commented May 20, 2024

Quick question on this one @cmeyer - is there a use case where having a special character in a file name is needed? For instance MS Word just brings up a simple error message if you try to name a file with a special character and makes you redo it, could/should we just do the same?

@jamesrussell216 This already occurs when using the file dialog.

image

The batch export functionality does not use the file dialog and batch export presents complications because some files may succeed and some may fail. For instance, if batch export is using the title of the data item in its template and some data item titles contain a slash.

@cmeyer
Copy link
Collaborator Author

cmeyer commented May 20, 2024

Just replace illegal characters with _ when creating the default filename from the image title. It’s tedious to go through and fix little O/S proclivities.

What are "illegal characters"? That's the crux of the issue.

See task list at the top of this bug, other comments, and abandoned implementation in #874 for more info and suggestions.

@jamesrussell216
Copy link

@cmeyer illegal characters will be anything that causes a file save error - should be easy enough to define surely?

I see the complication - it's potentially more complicated to catch all the places that an illegal character could come in from, thus probably better to make the code hand illegal characters safely rather than trying to block all the places they could come in.

@cmeyer cmeyer reopened this May 20, 2024
@cmeyer
Copy link
Collaborator Author

cmeyer commented May 20, 2024

Hmm not sure why this got auto-closed. Opening it again. It's not fixed with #1050.

@cmeyer cmeyer added the priority - medium automated priority tag label Jun 20, 2024
@cmeyer cmeyer closed this as completed in c1860a2 Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audience - user affects users effort - day a day or less f - IO f - organization good first issue good for newcomers impact - medium has straightforward but non-obvious workaround priority - medium automated priority tag reach - small affects a few users weekly type - bug
Projects
None yet
4 participants