Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shell: fix lost log messages during initialization #6578

Merged
merged 4 commits into from
Jan 24, 2025

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Jan 24, 2025

This PR (hopefully) fixes the issue reported in #6576. Early shell errors are lost because, for unknown reason, the shell was calling exit (1); instead of shell_die().

This fixes that issue and adds a test.

Copy link
Member

@chu11 chu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just that one typo

@@ -220,4 +220,18 @@ test_expect_success 'flux-shell: stdout/err from task.exec works' '
grep "^this is stderr" print.err &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message typo "erorrs"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed and will set MWP.

Problem: The `struct flux_shell` object is allocated in main() on the
stack without initialization of all fields to 0, but several shell
components check for non-NULL members of this structure for safety.

Ensure all fields of the struct are initialized to zero via a call
to memset(3).
Problem: `flux_shell_raise(3)` fails to check for a non-NULL
`shell->info` object, then proceeds to dereference it, causing a
potential segfault.

Add a check for non-NULL `shell->info` along with the other checks in
`flux_shell_raise(3)`.
Problem: The shell calls `exit` instead of `shell_die` during some
early initialization errors, but this doesn't give the logging system
a chance to detect a fatal error when the shell abruptly exits.

Switch the `exit (1)` calls to `shell_die()`.

Fixes flux-framework#6576
Problem: No test in the testsuite ensures that early job shell errors,
such as a failure to parse jobspec, are displayed in `flux job attach`
output.

Add a test to t2608-job-shell-log.t that submits a jobspec with
version=2 and ensures the expected error message is emitted.
@mergify mergify bot merged commit 16f7233 into flux-framework:master Jan 24, 2025
34 of 35 checks passed
Copy link

codecov bot commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 79.45%. Comparing base (ef85a55) to head (56b82af).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/shell/shell.c 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6578      +/-   ##
==========================================
+ Coverage   79.44%   79.45%   +0.01%     
==========================================
  Files         531      531              
  Lines       88311    88312       +1     
==========================================
+ Hits        70157    70168      +11     
+ Misses      18154    18144      -10     
Files with missing lines Coverage Δ
src/shell/log.c 89.09% <100.00%> (+0.60%) ⬆️
src/shell/shell.c 79.58% <66.66%> (+0.02%) ⬆️

... and 7 files with indirect coverage changes

@grondo grondo deleted the issue#6576 branch January 24, 2025 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants