DataTask Composition Policies and Granularity #21

Keith-Bateman · 2024-04-23T22:25:26Z

Create a factory pattern for different composition policies
DataTasks can be composed on a 1:1 basis (1 task for 1 operation), but in practice this doesn't require a distinct policy (default policy can subsume its functionality)
Default policy should split operations larger than a specified maximum DataTask size into multiple DataTasks, explore whether it's better to split I/O evenly across DataTasks or to put the biggest operation first.
Aggregating policy should still have splitting logic to maintain a maximum DT size, but also aggregate operations smaller than a given minimum size. There are complications regarding this, as different files, seeking, or changing operations can cause a naive aggregating policy to not work as intended. As a result, we will have to explore different solutions to create a more intelligent aggregating policy. A suggested baseline is to simply stop aggregation once variable conditions are detected (though certainly this isn't the best option).

Keith-Bateman added enhancement New feature or request core labels Apr 23, 2024

Keith-Bateman self-assigned this Apr 23, 2024

Keith-Bateman mentioned this issue Apr 23, 2024

DataTask Scheduling Dispatch #26

Open

Provide feedback