Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback about Flux Operator future service #45

Open
vsoch opened this issue Nov 24, 2022 · 1 comment
Open

Feedback about Flux Operator future service #45

vsoch opened this issue Nov 24, 2022 · 1 comment

Comments

@vsoch
Copy link
Member

vsoch commented Nov 24, 2022

If we can imagine a way for an HPC center to provision clusters (where each is owned by a user) via the Flux Operator, on demand for a user or group, we'd want control of instance types / sizes / costs, e.g.,

An ideal in my opinion would to be able to list the allowed instance types and max sizes, then have flux handle provisioning (on-demand or spot) on a per-job basis. It could use qos flags to decide whether to chain sequences on the same instances (to amortize provisioning costs) versus spreading (to minimize time to completion). I think these policies are possible with kubernetes (thus minimizing customization to any specific cloud provider, as with current solutions).

In thread here:
https://hachyderm.io/@jedbrown/109396976059698506

Thanks @jedbrown!

@vsoch
Copy link
Member Author

vsoch commented Mar 17, 2024

@jedbrown heads up that we are working on a similar use case with https://github.com/converged-computing/rainbow, although it doesn't necessarily have to be a flux operator owned cluster (but the experiments I'm prototyping today are all flux operator clusters, specifically on different node pools on a cloud). I can post more here when it's done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant