Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progressive File Layout #16

Open
obilaniu opened this issue Sep 24, 2024 · 1 comment
Open

Progressive File Layout #16

obilaniu opened this issue Sep 24, 2024 · 1 comment
Labels
enhancement New feature or request gathering-requirements Issues open for collecting feedback, ideas, and requirements for potential future implementations.

Comments

@obilaniu
Copy link

obilaniu commented Sep 24, 2024

A few weeks back at the Stammtisch, the idea of a Progressive File Layout (PFL) implementation (à la Lustre but less complicated) was raised. The user that raised the topic hasn't made a feature request yet here so I am taking the initiative.

  1. Absent PFL, when a filesystem is configured, a choice of default striping must be made, and thus a compromise.
    • If no striping is done (stripe=1) then one single large file downloaded onto the filesystem will unbalance the storage targets and all accesses to it will be directed to one storage target. On the other hand, smaller files perform better.
    • If striping is done (stripe>1) then a stripe count and size might be found to ease the burden of any one large file on the filesystem, but it might penalize small files also on the filesystem because of more targets to be contacted to piece them together.
      • For example, the Canadian national clusters managed by Digital Research Alliance Canada configure their Lustre with a default PFL of 1x (no) striping [0, 128MiB) and 2x1MiB striping for the range [128MiB, end), or suchlike.
  2. Certain files have internal structure (such as a read-mostly header, followed by parallel-access data areas) that could benefit from different striping schemes.
  3. Currently, it is not possible to migrate in-place a file from one striping scheme to another. A "deep" copy is required, taking double the space temporarily. Such space may not be available, thus also requiring a more convoluted migration process.

What I proposed at the Stammtisch is a simplified variant of PFL with 2 (+1) zones. A user would be able to define two zones:

  • A "header" from offset 0 to offset +X blocks and
  • A "payload" from offset +X to the end of the file.

each with independent stripe count/size.

The additional (+1) zone would be a filesystem-internal zone, not visible to the user, whose utility would be in guaranteeing that an in-place, server-side migration between arbitrary two-zone striping schemes can always be performed safely. That would be achieved by gradually rewriting the file from one scheme to another, chunk-by-chunk, never fully duplicating the file and atomically updating with every chunk the updated "true" PFL until it matches the target PFL.

This would address the target-unbalance issue and the performance issues; two zones ought to cover most use-cases; and also enable restriping without deep-copying.

@obilaniu obilaniu added enhancement New feature or request new Issues that haven't been triaged yet labels Sep 24, 2024
@iamjoemccormick
Copy link
Member

Hi @obilaniu,

Thank you for the detailed proposal and write-up of what we discussed at Stammtisch. I agree this would be a valuable addition to BeeGFS. While we don't have immediate plans to begin work on this, we'll use this issue to continue collecting feedback and ideas on how this could eventually be done.

@iamjoemccormick iamjoemccormick added gathering-requirements Issues open for collecting feedback, ideas, and requirements for potential future implementations. and removed new Issues that haven't been triaged yet labels Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gathering-requirements Issues open for collecting feedback, ideas, and requirements for potential future implementations.
Projects
None yet
Development

No branches or pull requests

2 participants