Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Allow providing dwarfs with a dedup library #208

Closed
exeter-matthew-wakeling opened this issue Apr 12, 2024 · 4 comments
Closed

Comments

@exeter-matthew-wakeling

This is really neat.

Feature request: Add a "library" option to give mkdwarfs a list of files that should be loaded into the dedup mechanism first, but not stored, allowing the image to be even smaller if the contents of the file can be retrieved from that library instead. Bonus points if you can specify a dwarfs image as a library and have it sensibly use the files contained in it.

Then you have the basis for a deduplicating incremental backup system. Currently, I have a system I wrote that will take a single file and a list of library files and produce a compressed deduplicated file that can re-create that single file using the library, which is great if you use tar to create that single file, but a little unwieldy when coming to decompress and restore everything. The bonus of making it a proper mountable filesystem instead is that then it's a proper mountable filesystem and retrieving single files is a doddle.

My use case is that I have students, and I have given them coursework, which involves them logging in to a Linux machine and hacking away. I want to store regular snapshots of their work so that I can keep a backup for their sake but also so I can see a progression of development to try to work out if they are cheating (yes, I have had to deal with this), but I don't want to store 100 copies of the same fairly large files. Yes, I could achieve the same thing using ZFS snapshots, but that'd require snapshotting the entire filesystem, which is more than I want to do, and it requires root.

@mhx
Copy link
Owner

mhx commented Apr 12, 2024

This is really neat.

Thanks!

The "deduplicating incremental backup system" use case is quite high on my todo list (but it has been there for a while now). One of the first issues (#18) has a comment that summarizes the idea in two sentences. It pretty much boils down to

Bonus points if you can specify a dwarfs image as a library and have it sensibly use the files contained in it.

only that everything will be contained in a single DwarFS image; i.e. you'll be appending the incremental data to an existing image.

There's still no firm timeline, though.

@mhx
Copy link
Owner

mhx commented Apr 12, 2024

As mentioned on HN, borg might work for your use case in the meantime.

@exeter-matthew-wakeling
Copy link
Author

exeter-matthew-wakeling commented Apr 12, 2024

only that everything will be contained in a single DwarFS image

That's a good idea, because it reduces the chances that the "library" will accidentally go missing. The only request I would make would be the ability to still mount or extract the older version of the filesystem. Edit: I just read that comment, and I see that's already what you're planning. Great.

@mhx
Copy link
Owner

mhx commented Apr 18, 2024

Closing this as it's covered by #18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants