Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idea: add FLUX environment variable that holds "closest enclosing jobid" #6474

Open
grondo opened this issue Dec 5, 2024 · 7 comments
Open

Comments

@grondo
Copy link
Contributor

grondo commented Dec 5, 2024

This idea was brought up by @trws on slack and in a project meeting.

Flux currently doesn't have a consistent way to determine the "nearest" enclosing jobid. The cases are somewhat delineated in #3817, though the information there may be outdated (i.e. there does now exist a flux_job_timeleft(3) function). While it makes sense that FLUX_JOB_ID is set in the environment of tasks launched by flux run and flux submit but not in the environment of the initial program in flux alloc and flux batch, this will likely continue to cause confusion and annoyance for users.

One idea put forward by @trws is to add another FLUX_ jobid variable that is always set by the job shell which is not cleared. (Please correct me if I'm mistaken). This variable would leak through to initial programs, which would then be able to use this variable to determine the jobid of their parent instance (if there was a jobid associated) - equivalent to, but more straightforward than using flux getattr jobid. It would also be available in flux run and flux submit where it would be the same as FLUX_JOB_ID. Comparing the two variables would allow users to easily determine if they are in an initial program environment or within a job. Lack of this new environment variable would indicate that there is no enclosing job, i.e. the current process is not within an instance, or the enclosing instance is not itself a job.

Edit: if we enable this feature, that may allow us to close #3817.

@chu11
Copy link
Member

chu11 commented Dec 9, 2024

a random idea I thought of. if we want to avoid spreading too many environment variables, could we support a new command like hypothetically flux job whoami? (i.e. like flux job last).

@grondo
Copy link
Contributor Author

grondo commented Dec 9, 2024

That's a pretty good idea. I wonder if that would satisfy @vsoch and @trws (I think it probably would, but they should weigh in just as well)?

@garlick
Copy link
Member

garlick commented Dec 9, 2024

Would flux job whoami just be an alias for flux getattr jobid then?

@grondo
Copy link
Contributor Author

grondo commented Dec 9, 2024

I think if FLUX_JOB_ID is set, it would return that, otherwise flux getattr jobid. If that failed, then it would return empty?

@vsoch
Copy link
Member

vsoch commented Dec 9, 2024

An environment variable or flux attribute (each of which works consistently across cases) would be great. Another thing we would like to have is storing the equivalent, but for the very top level instance id. E.g.,:

FLUX_TOP_LEVEL_ID=xxx

I am adding a command flux usernetes top-level for our usernetes case that walks up the chain of flux parent-uri until it gets to the top level parent to interact with, and if that is orchestrated more simply via a passed envar (to avoid all the operations to get flux-uris of the parent) that might be a better solution? The use case is in prolog/epilog, for cases when we need to store metadata across all levels of some root instance, it makes sense to put at the top level for everyone to find.

I think whatever you decide to come up with will be hugely helpful, so thank you in advance!

@chu11
Copy link
Member

chu11 commented Dec 9, 2024

I think if FLUX_JOB_ID is set, it would return that, otherwise flux getattr jobid. If that failed, then it would return empty?

Yeah, this is what I was thinking. Just wrap the logic into it.

@trws
Copy link
Member

trws commented Dec 31, 2024

That would be fine I think, it's consistency that matters most. Having it be a command might be best since that means things like flux toplevel ... would work, or if we have one for parent or whatever right, it's more composable that way.

I admit to a personal preference to have access to at least the innermost job ID and matching flux URI be very easy though, since that's what people are most used to from other systems and will need to do naive ports of job scripts that use the enclosing jobid and talk to the system scheduler in batch scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants