-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting an "initial current directory" #24
Comments
Can we treat this as two separate issues:
I think just fixing (1) will have a lot of benefits on its own. TBH I'm not sure why we don't already have this concept as part of the pre-open code which emulates relative paths. We can consider (2) separately. We could just use |
We do have a very minimal concept of a PWD. If you have a relative path, the preopen code in wasi-libc will look for a preopen named "." and use the associated handle as the base. One simple (though not strictly POSIX-conforming) implementation of If we want |
AFAIK most usage in Rust of the current directory is used to make relative paths absolute, so I don't think we can return I mostly wanted to open this issue to see if others had different ideas for how to implement this. I agree that if we pick a strategy where it's "mostly emulated" then we can split this into a possible new API addition and a wasi-libc issue. I do think we want to put as much of this as we can into wasi-libc, though. |
WASI is attempting to hide host absolute paths from applications. People often do expose them today, because it's one of the most convenient ways to use preopens in practice today, though it isn't necessary -- you can use things like --mapdir= in some wasm engines to hide host paths. Often the reason programs canonicalize paths to absolute is to send those paths to other programs with different cwds. For those use cases, just using Eventually, we'll have ways for WASI instances to spawn new WASI instances and pass them files and directories and such, so the question is, can we get by without requiring programs to know the actual host cwd? |
Hm so my main point is that applications today rely on getcwd and such, and the lack of support for this today is a hurdle to overcome when porting programs to WASI. Without a standardized solution each application will end up making specific solutions that aren't necessarily compatible, so I think given the prevalence of this it would be worthwhile to standardize something. I don't think there's any need to expose absolute host paths or even the real host cwd to applications. I imagine that setting up a WASI execution you'll basically always set up a virtual filesystem with mappings from the host to the guest, and there's no need for any of them to actually match. |
maybe it would make sense to establish cwd as some sort of standardized preopen (not required to be present, of course) |
Ah, if you're doing WASI parent to WASI child, and sharing the same virtual mapping, then yes, something like this could make sense. And I agree with @devsnek and other comments above, that this does sound closely related to preopens, so that's a reasonable place to start. |
Isn't step 1 just making getcwd(), chdir() work as expected and having that be honored by open/stat/etc when passed relative directories. This can be done with the default starting PWD just being "/" (in the virtual filesystem). Then step 2 would be to decide how we might want to specify the default starting directory? It might be that most users are happy with just step 1. |
I agree. The algorithm I sketched out above would be a good place to start, if anyone's interested. Adding in an ability to have a starting current directory other than "." would be natural to add on top of that. |
I've taken an initial stab at emulation at WebAssembly/wasi-libc#214 for wasi-libc. |
One use-case for WASI is for shipping CLI tools. Imagine an existing cross-platform C++ codebase. We can ship native binary packages for various Linux-es, OS X, Windows and what else. Or we can build and ship a single WASM file. Looks like a clear win, doesn't it? For this to work, the behaviour of a WASI build should be as close as possible to the native build. The proposed emulation approach creates several behavioural differences:
Therefore I propose to extend WASI with explicit
Alternatively, an API to retrieve the current file path from a directory descriptor could work, but it is tricky to implement on some platforms. |
Yep! We have a bunch of work to do for it to be a clear win in practical terms, but we want WASI to support great CLI tools. WebAssembly/wasi-libc#214 is a first step to |
@sunfishcode It's great that While incremental approach is usually a good thing, I am concerned that it might result in hard to debug problems for the end users. CLI tool authors, especially when coming from the context of native development, are unlikely to be aware of the limitations in I'm afraid that the difference between native filesystem semantics and WASI is going to cause pain for end users, developers and will ultimately hurt the adoption. Realising that WASI is work in progress, I'd like to raise awareness early on that the current WASI API might need extensions in order to better match the native filesystem semantics. Bellow I highlight challenges in a 'pure' libc implementation on top of the current WASI. Current working directoryA native program has working directory regardless of PWD; $ mkdir -p /tmp/a/b/c &&
ln -fs a/b/c /tmp/d &&
cd /tmp/d &&
mv /tmp/d /tmp/e &&
python -c 'import os;print(os.getenv("PWD"),os.getcwd())' &&
stat /tmp/d
('/tmp/d', '/tmp/a/b/c')
stat: cannot stat '/tmp/d': No such file or directory The path stored in I believe that WASI should provide a robust method to get the current working directory. It is necessary to define what happens if the current working directory is not within a mapped subtree. Concurrent modifications to the filesystemFilesystem is a shared resource therefore unrelated programs could make changes to the filesystem concurrently, including renaming our current working directory. Native A Another option is to compute the current working directory path dynamically. The idea is to iteratively open the parent directory, iterate dentries, and match name by inode number. This is less efficient of course; Wasmer currently returns 0 for device id/inode number. I'm unsure if exposing host device id/inode number is acceptable privacy-wise. Scrambling these IDs in a WASM runtime is definitely non-trivial. To conclude, it looks like we can't currently emulate Symlinks in current working directory pathNative If we choose to cache the path and update it on It will increase the binary size for virtually any CLI tools working with the filesystem. Hence WASI-level support is definitely desired. To summarise, I believe that WASI needs extensions to support current working directory in a way compatible with native filesystem semantics. This is important for WASI to be a compelling target for building CLI tools. Filesystem is a basic service which most CLI tools need. |
One underlying observation here is that any program depending on POSIX itself recognizes the limitations of APIs like this, and added the |
When using a shell renaming the current directory should show the new path when printing the prompt the next time. While the prompt may temporarily be outdated. The next time it is drawn, the new name prevents potential confusious by the user. When deleting the current directory this would also automatically append "(deleted)" to the shown path. tl;dr: While it doesn't help much with TOCTOU errors, it does help a lot with visual presentation to the user. |
@bjorn3 FWIW, I just tried |
@bjorn3 @sunfishcode Thank you for the comments. I've been checking other threads (ex. WebAssembly/WASI#109). My takeaway was that
The following proposal might be better aligned with the project goals:
Pros: in line with WASI design goals, makes implementing Thoughts? |
If some embedders choose to offer baked-in current-working-directory features, and every program that calls It would be very helpful if you could describe specific functionality that depends on |
I want to share a C compiler with custom language extensions with the widest audience possible (https://github.com/rapidlua/barebone-c). In order to have any adoption, I need to ship prebuilt packages. WASI makes it significantly more convenient for me. Ideally, the WASI build of the tool should behave exactly the same as the native one does. Current working directory must be defined, relative and absolute paths must work. People don't typically rename directories from under a compiler running, therefore this particular difference might be insignificant after all. I'd like to take a step back and generalise a bit. Supposedly, WASI is a compelling target for parties shipping CLI tools and unwilling to build multiple packages for a plethora of OS-es and hardware architectures. The major reason for Emscripten's success was that little to no changes were required to the source code. People shouldn't need to rewrite their code in order to benefit from WASI. In case of complex projects like LLVM, it's hard to judge what effects the slightly different semantics in the platform API will cause. Therefore I feel that it is important to have WASI libc being as POSIX-ly compatible as possible. Personally, I don't need I am not particularly concerned with the code size. The reason I was linking to the other thread was to show that I'm actually making effort to understand WASI agenda.
This complicates matters. What if the host's current working directory is not mapped? You've mentioned that WASI already has |
Thanks! Current working directory is being worked on, and relative and absolute paths already work in C/C++ APIs. One missing area if you want to run clang is
LLVM is an interesting example; it makes extensive use of It really helps us to hear real-world use cases, to help us make decisions about how best to support various features. If you want to do something and it doesn't work, isn't efficient enough in some setting, or isn't robust enough, we'd like to hear about it.
I expect it will be possible to add
POSIX says that
Yes, path resolution can cross "mount points". We won't need any extra bookkeeping to support directories being renamed by other programs, because we hold file descriptors for our open directories which are stable across renames. I have a pretty good idea of what bookkeeping we'll need if we need to support programs renaming directories they they themselves have open and then |
With WebAssembly/wasi-libc#214, wasi-libc now has basic emulation of I believe all of the questions here have been answered, but feel free to open up follow-up issues if there are other things to address. |
Oh, if I knew this was planned, I would not have added this workaround to boost::filesystem. (Mainly commenting for the benefit of people who look at both of these issues in the future.) |
I got a feature request in GoogleChromeLabs/wasi-fs-access#2 to use the current working directory functionality. In general, https://github.com/WebAssembly/wasi-libc/pull/214/files looks promising, but my understanding is that the current directory emulation is purely internal to the The problem is, on https://wasi.rreverser.com/ I'm running each command in a separate short-lived Wasm instance - this allows to use This means that, even if some command changes a current directory, I have no way of reading it back from a Wasm instance, nor any way of setting it as a current directory for the next one (as it always starts out with So, I guess, the request for WASI itself to support "current directory" still stands - can we add syscalls that would allow saving and reusing cwd as part of the implementors' global state, rather than keep it limited to a Wasm instance? |
@RReverser I can understand wanting to set the initial working directory to something other then "/" for new modules. However reading the current working directory back out of a child process I don't think is needed for POSIX-like environments. There are no situations that I know of where a |
I guess that's fair, if we consider only POSIX-like use-cases. I'd be actually okay with not having this part - I already special-case some commands in the emulator, and, indeed, would have to do the same for However, I can't think of a way around this part:
|
You mentioned this yourself, above, but just to resurface - maybe wasi-libc could read |
That sounds like it would work. We have been trying not rely on the environment as much as possible for core functionality. If we could find some way to make it opt-in so that not all libc-based programs would end up depending on getenv I think it could be acceptable. Alternatively perhaps we could add new preopen type. Right now we only have one: __WASI_PREOPENTYPE_DIR. We could perhaps add __WASI_PREOPENTYPE_PWD? (regarding |
That could work nicely, yeah, and we could even reuse existing types and functions for getting dir length & contents. |
Should we reopen this issue for tracking & discussion for now? |
Could you describe the use cases for this in more detail? GoogleChromeLabs/wasi-fs-access#2 doesn't have much detail, and I'd like to understand how you envision programs would use this. |
Just being able to start program in arbitrary current directory, and list files / access files in it using relative names. Particularly for https://wasi.rreverser.com/, it would allow basic commands like The reason it doesn't work at the moment is because, even if I handle |
I think it would be useful to have a standardized way to tell a wasi program where it "is" in a virtual filesystem at initialization ( |
Agreed. |
My main concern here is that a preopen for cwd would imply a capability that has to be implicitly passed into all programs, whether they need it or not, because there'd be no way for the environment to know whether the program needs it. That doesn't follow the principle of least authority. Looking forward, I expect interface types to give us a way to have programs request the capabilities they need, at which point we'll have the option of allowing programs that need a cwd handle to request one. From that perspective, the question here is, does it make sense for us to spend resources designing and implementing a mechanism that we expect will be temporary? And will we risk baking in a compatibility requirement that we'll need to continue to support? |
I'd say it can be something we wait to add under the assumption we actually move forward in the near future with fixing fd types, using imports/tables, etc. |
Given that current working directory would be just a string, and actual access would be controlled by list of If it's performance overhead that you're worried about, then I'd note that programs that don't need cwd, won't call |
Support for current working directories is now implemented in WASI libc! |
@RReverser The concern with a |
@sunfishcode Hmm did you mean to close this issue? Previous time it was reopened to track setting initial currently directory in some way. Does your last comment mean it's decided against and that's why you close it? |
Ah, sorry, I missed that the issue had been reopened for that purpose. I'll reopen it again. That said, I still have the concern about the need to pass in the preopened current working directory without knowing whether it's needed or not. |
(Also, moving to wasi-filesystem, as this is a filesystem issue :-)). |
I don't mind alternative approaches that solve the problem, e.g. a separate import function which would be easy to detect. I just think it's a problem worth solving :) |
Instead of adding a pre-open type, perhaps we can just use |
We have been trying to avoid having core parts of wasi-libc system depend on the use of environment variables. |
I've heard this issue raised from several different folks, and I now think it makes sense to add a concept of an initial current working directory as a type of preopen. I don't know if it makes sense for preview1 due to backwards compatibility constraints, but it's something we can easily add to preview2. |
I've also added an agenda item to the next WASI meeting to have an initial discussion for this: WebAssembly/meetings#1264 |
Many applications today (or at least many CLI apps) rely on the idea of a "current directory" of the calling process (e.g.
getcwd
andchdir
). Currently, though, WASI doesn't define what it means to have a current directory and implementations like wasi-libc don't have agetcwd
symbol. For ease of porting applications, however, I think it might be good to support this concept.I'm not entirely sure if this actually needs to manifest itself as new WASI APIs, however. They're even more low level than libc typically is and it may be possible to get away with having the concept of a current directory being entirely within wasi-libc. I wanted to open the issue here, though, to see if others felt the same and have some discussion with respect to WASI itself rather than just wasi-libc.
One idea I've got is that wasi-libc could interpret preopened paths as either absolute (starting with
/
) or relative to the root (those that don't start with/
). Next a new syscall would be added:And then
wasi-libc
would contain emulation ofgetcwd
andchdir
as necessary. This would allow applications which want to print paths relative to the current directory to be able to print values appropriately and applications could also be started in arbitrary locations as decided by the embedder.In any case I'm curious if others have thought about this as well, and if there's other interest in supporting this as well.
The text was updated successfully, but these errors were encountered: