Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-exec: fix confusing "job shell exec error" log message #6579

Merged
merged 2 commits into from
Jan 24, 2025

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Jan 24, 2025

Problem: When the job-exec module detects an exec error for a job shell it emits a confusing error message that includes either the path to the job shell or the IMP (if a multiuser job), and only the result of strerror() for the errno returned from libsubprocess. When using sdexec, this errno is always ENOENT, resulting in a confusing error message that seems to indicate that flux-imp was not found.

It is unhelpful to include argv[0] in this error message. It will always be the job shell or the IMP and we all know it. Drop this from the log message.

Also, sdexec will provide extra information in the subprocess error string available from flux_subprocess_fail_error (p). Log this instead of strerror (errno).

Fixes #6568

Copy link
Member

@chu11 chu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable to me!

@grondo grondo force-pushed the issue#6568 branch 2 times, most recently from 7033da9 to 9d1af0e Compare January 24, 2025 22:20
@grondo
Copy link
Contributor Author

grondo commented Jan 24, 2025

Apparently there was a test that explicitly checked that the job shell path was emitted in the exec error. Removed that test and repushed.

Problem: A test in t2400-job-exec-test.t checks that the path to
the job is emitted on a job-exec subprocess exec error, but it has
been determined that including this path in error message is more
confusing than helpful.

Remove the test.
Problem: When the job-exec module detects an exec error for a job
shell it emits a confusing error message that includes either
the path to the job shell or the IMP (if a multiuser job), and
only the result of `strerror()` for the errno returned from
libsubprocess. When using sdexec, this errno is always `ENOENT`,
resulting in a confusing error message that seems to indicate
that `flux-imp` was not found.

It is unhelpful to include `argv[0]` in this error message. It will
always be the job shell or the IMP and we all know it. Drop this
from the log message.

Also, sdexec will provide extra information in the subprocess error
string available from `flux_subprocess_fail_error (p)`. Log this
instead of `strerror (errno)`.

Fixes flux-framework#6568
@mergify mergify bot merged commit 0836ef6 into flux-framework:master Jan 24, 2025
35 checks passed
Copy link

codecov bot commented Jan 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.46%. Comparing base (921ed27) to head (0b12b76).
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6579      +/-   ##
==========================================
- Coverage   79.47%   79.46%   -0.01%     
==========================================
  Files         531      531              
  Lines       88421    88421              
==========================================
- Hits        70274    70267       -7     
- Misses      18147    18154       +7     
Files with missing lines Coverage Δ
src/modules/job-exec/exec.c 80.65% <ø> (ø)

... and 8 files with indirect coverage changes

@grondo grondo deleted the issue#6568 branch January 25, 2025 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

misleading error message: job shell exec error on ...: /usr/libexec/flux/flux-imp: No such file or directory
2 participants