Feedback about Flux Operator future service #45

vsoch · 2022-11-24T05:18:29Z

If we can imagine a way for an HPC center to provision clusters (where each is owned by a user) via the Flux Operator, on demand for a user or group, we'd want control of instance types / sizes / costs, e.g.,

An ideal in my opinion would to be able to list the allowed instance types and max sizes, then have flux handle provisioning (on-demand or spot) on a per-job basis. It could use qos flags to decide whether to chain sequences on the same instances (to amortize provisioning costs) versus spreading (to minimize time to completion). I think these policies are possible with kubernetes (thus minimizing customization to any specific cloud provider, as with current solutions).

In thread here:
https://hachyderm.io/@jedbrown/109396976059698506

Thanks @jedbrown!

vsoch · 2024-03-17T21:40:43Z

@jedbrown heads up that we are working on a similar use case with https://github.com/converged-computing/rainbow, although it doesn't necessarily have to be a flux operator owned cluster (but the experiments I'm prototyping today are all flux operator clusters, specifically on different node pools on a cloud). I can post more here when it's done.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback about Flux Operator future service #45

Feedback about Flux Operator future service #45

vsoch commented Nov 24, 2022

vsoch commented Mar 17, 2024

Feedback about Flux Operator future service #45

Feedback about Flux Operator future service #45

Comments

vsoch commented Nov 24, 2022

vsoch commented Mar 17, 2024