Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local scheduler #36

Open
bschroeter opened this issue Nov 19, 2024 · 2 comments
Open

Local scheduler #36

bschroeter opened this issue Nov 19, 2024 · 2 comments

Comments

@bschroeter
Copy link
Collaborator

Discussions around portability of this software suggest that a local execution strategy would be useful.

This issue will serve as a place to collect thoughts and requirements to formalise the approach.

@bschroeter
Copy link
Collaborator Author

There are a few ways to approach this, some more complicated than others.

The simplest approach for this would be to have a client that simply performs a subprocess.run of a bash (?) command against a script, with additional variables passed in via the subprocess environment. This would essentially execute immediately as there is no queue to wait in.

The client could then return a unique identifier for the completed process which could then be used for dependency downstream via an internal lookup dict of completed processes (in order to check return codes).

This would be a largely serial operation, but would have some dependency built in.

Another approach is to look into non-blocking subprocess methods:

https://stackoverflow.com/questions/16071866/non-blocking-subprocess-call

Which could allow tracking of PIDs to maintain dependency.

A third approach could be to use Dask to assemble a delayed architecture as a local scheduler:
https://docs.dask.org/en/stable/delayed.html

This would require some kind of final call to the scheduler to trigger computation inside the client object, and would require Dask as a dependency to the project.

Lastly, the graph approach that I demonstrated might be useful here as well.

I am partial to the first option, as it is the easiest to implement in a short timeframe.

@ccarouge, do you have any opinions here?

@ccarouge
Copy link
Member

A serial operation would make this really slow for application to benchcab for example. In benchcab tests right now, we can get the results of the fluxsite tests in 10 min if using 48 cores. Going serial is obviously going to make this a lot more painful.

That said, the speed performance is probably not the priority now as a full benchcab suite should be only needed to run a few times at the end of development. People can reduce the number of runs to get quick turn around during development

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@bschroeter @ccarouge and others