-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tools to prune cache #3
Comments
Good point. Agreed. I did not include it, because it requires some thinking.... |
@wch just thought this through for shiny caching, so is likely to have good ideas. |
One difficulty is that we would probably need to store the last-access time stamp, we definitely don't want to remove packages that were used not long ago. For this we need to lock the cache, with an exclusive lock, which is not ideal. |
Do you think it's too unreliable to use the file system last access time? |
Yeah, we can try that as well, but yeah, still not always reliable. In particular, AFAICT usually not available in Docker containers. I think it is also not always enabled on Windows. But we can figure something out, probably, e.g. have a separate lock for the access times. |
Although maybe you don't want to prune in Docker containers, anyway. But windows is still an issue, and in general it is just too platform dependent to rely on. |
The atime attribute can't be relied on in general. In Linux, it's not unusual to mount a filesystem with I ran a bunch of tests on mtime, ctime, and atime here: https://gist.github.com/wch/9bc615c70219c7ac15f7b339ddd7a30d The solution I ended up using was to use mtime, which seems to work reliably across platforms, and call Note that The disk caching and pruning code in the link above is designed to work when multiple processes are using the same directory to store objects, so no locking is required (there are some potential races, but all are recoverable, since it's just a cache). All the relevant state for the objects (name, time, size, and the content) is stored on the filesystem, so you can stop an R process that uses the directory for a cache, then start another one and point it to the directory, and it will continue to work fine. |
@wch Thanks! |
Note: I am going to postpone this until we have a database backend, to avoid having to rewrite it then. |
I don't think we need it in this version, but I think the next version should have some way to automatically prune the cache to keep it below a user specified threshold with default (maybe controlled via an environment variable, and set to say 5 Gb by default?)
The text was updated successfully, but these errors were encountered: