Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataTask Composition Policies and Granularity #21

Open
Keith-Bateman opened this issue Apr 23, 2024 · 0 comments
Open

DataTask Composition Policies and Granularity #21

Keith-Bateman opened this issue Apr 23, 2024 · 0 comments
Assignees
Labels
core enhancement New feature or request

Comments

@Keith-Bateman
Copy link
Member

  • Create a factory pattern for different composition policies
  • DataTasks can be composed on a 1:1 basis (1 task for 1 operation), but in practice this doesn't require a distinct policy (default policy can subsume its functionality)
  • Default policy should split operations larger than a specified maximum DataTask size into multiple DataTasks, explore whether it's better to split I/O evenly across DataTasks or to put the biggest operation first.
  • Aggregating policy should still have splitting logic to maintain a maximum DT size, but also aggregate operations smaller than a given minimum size. There are complications regarding this, as different files, seeking, or changing operations can cause a naive aggregating policy to not work as intended. As a result, we will have to explore different solutions to create a more intelligent aggregating policy. A suggested baseline is to simply stop aggregation once variable conditions are detected (though certainly this isn't the best option).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant