Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad argument escaping on Windows #3436

Open
spcfox opened this issue Dec 8, 2024 · 2 comments
Open

Bad argument escaping on Windows #3436

spcfox opened this issue Dec 8, 2024 · 2 comments

Comments

@spcfox
Copy link
Contributor

spcfox commented Dec 8, 2024

Steps to Reproduce

Compile two programs for Windows. I use Node.js for cross-platform compilation, but the backend shouldn't matter.

module PrintArgs

import Data.List
import System

main : IO ()
main = do
  args <- getArgs
  traverse_ putStrLn $ zipWith
    (\n, arg => "\{show n}: '\{arg}'") [0 .. length args] args
module Run

import System

main : IO ()
main = ignore $ system $ ["node", "printArgs.js"] ++ !getArgs

Run them in cmd:

C:\folder with spaces> node "C:\folder with spaces\printArgs.js" "hello \" world"
0: 'C:\folder with spaces\printArgs.js'
1: 'hello " world'

C:\folder with spaces> node "C:\folder with spaces\run.js" "hello \" world"
0: 'C:\folder with spaces\printArgs.js'
1: 'C:\folder'
2: 'with'
3: 'spaces\run.js'
4: 'hello'
5: ' world'

Expected Behavior

A system with a list argument should properly escape the arguments so that they are passed as they were.

Observed Behavior

Spaces and quotes are not properly escaped.

@spcfox
Copy link
Contributor Author

spcfox commented Dec 8, 2024

How does Windows parse arguments?

cmd does not tokenize arguments. The Windows API provides the GetCommandLine function to retrieve the complete command. Applications split it into arguments themselves, typically using CommandLineToArgvW from the Windows Runtime. The documentation describes the following escaping rules:

  • 2n backslashes followed by a quotation mark produce n backslashes followed by begin/end quote. This does not become part of the parsed argument but toggles the "in quotes" mode.
  • 2n + 1 backslashes followed by a quotation mark produce n backslashes followed by a literal quotation mark ("). This does not toggle the "in quotes" mode.
  • n backslashes not followed by a quotation mark simply produce n backslashes.

Note that ^ is not used for escaping. You can find an example of argument escaping in the Python subprocess module code. This escaping is sufficient if you create a process using CreateProcess from the Windows API (as done by subprocess.run in Python or child_process.spawn in Node.js). However, system invokes a shell like this:

int system(command) { return CreateProcess("cmd.exe /c " + command); }

Thus, it requires additional escaping for cmd-specific special characters.

Why does escapeArg fail?

It escapes cmd-specific special characters, allowing correct handling of symbols like > and |. However, arguments with spaces and quotes are still mishandled.

The issue lies in cmd escaping (which doesn't tokenize arguments). It ensures characters like > are not interpreted as redirection. However, using ^ to escape spaces and quotes results in them being consumed by cmd and not passed to the application. Even if passed, applications generally don't interpret ^ as escaping.

Additionally, not all special characters are escaped with ^. For example, if delayed expansion is disabled, ! doesn't require escaping; if enabled, it needs two ^ (details). I'm unsure if environment variable substitution via % can be escaped (details). Yeah, cmd is crazy at times.

Currently, escapeArg escapes characters only for cmd and not for argument processing. This behavior is often undesirable.

How to fix escapeArg?

If an argument contains spaces, it should be enclosed in double quotes. This also handles most cmd special characters. Quotes and backslashes should be escaped with backslashes per the documentation.

Proposed solution

  1. Update the documentation for escapeArg and escapeCmd.
    Clarify whether they escape arguments for the target application or just for the shell. As outlined above, these are different on Windows. Currently, only the second one is implemented.

  2. Fix escapeArg for Windows. I can implement this, but first, it's worth to decide on point one.

  3. Add a function to create processes using the OS API without a shell. Most languages have this feature. This simplifies launching executables without escaping complexities. For example, it would be helpful in code generators. A problematic snippet in the Node.js code generator (other generators likely share this issue, though tests don't catch it):

    quoted_node <- pure $ "\"" ++ node ++ "\"" -- Windows often have a space in the path.
    coreLift_ $ system (quoted_node ++ " " ++ outn)

Useful notes

While most applications follow the conventions, they are not required to. Specifically, .bat and .cmd scripts process arguments differently. The corresponding OpenJDK code attempts to account for this.

Windows libraries are closed source, but similar functions can be found in the ReactOS source code:

@spcfox
Copy link
Contributor Author

spcfox commented Dec 9, 2024

3. Add a function to create processes using the OS API without a shell

It looks like Chez still doesn't have that function

cisco/ChezScheme#604

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant